Re: Python 3 Statsmodels & Pandas

Andreas Tille Tue, 26 Sep 2017 00:19:06 -0700

On Mon, Sep 25, 2017 at 09:36:43PM -0700, Diane Trout wrote:
> > > * Some of the doc pages call get_rdataset, and there's no network
> > > access in the builder so those calls fail. (ugliest error)
> > 
> > Can you pre-fetch the data and provide it in debian/datasets?
> 
> I made the changes and cached the downloaded zip files....
> 
> and then realized isn't this redistributing datasets?
> 
> Don't we need to verify the license before uploading?


In principle yes.
 
> Below is what I've found so far, (before getting tired of licensing
> issues)
> 
> Any thoughts about how to handle this?

I wonder how this was dealt with before?  If that much data sets were
needed to build the docs, how did the doc generation process worked
before?

My thought is that we should shift anything to a later point of
development which would cause a delay in fixing all RC bugs -
specifically #873512 remains open.  I have pinged upstream about this.
If we migrate to ship also python3, we need extra cycles through new and
may be new bugs to fix.

BTW, I'd love if you would merge your work to master branch.  I'm a bit
confused by the amount of branches and lost track which one to look at.
 
> Here's a list of the file names from the include-binaries file I
> created via caching.
> 
> datasets.csv.zip                    
> csv,HistData,Guerry.csv.zip         
> doc,HistData,rst,Guerry.rst.zip     
> csv,COUNT,medpar.csv.zip            
> doc,COUNT,rst,medpar.rst.zip        
> csv,car,Duncan.csv.zip              
> doc,car,rst,Duncan.rst.zip          
> csv,robustbase,starsCYG.csv.zip     
> doc,robustbase,rst,starsCYG.rst.zip 
> doc,car,rst,Moore.rst.zip           
> csv,vcd,Arthritis.csv.zip           
> doc,vcd,rst,Arthritis.rst.zip       
> csv,MASS,epil.csv.zip               
> doc,MASS,rst,epil.rst.zip           
> csv,geepack,dietox.csv.zip          
> doc,geepack,rst,dietox.rst.zip      
> 
> The files are being downloaded from this github repository.
> 
> https://github.com/vincentarelbundock/Rdatasets a useful index of the
> datasets is 
> http://vincentarelbundock.github.com/Rdatasets/datasets.html

This reminds me to the debian/README.source files ftpmaster once
suggested for R packages[1].  May be that's an apropriate way to
document the licenses?  Feel free to find examples for instance
in the package r-cran-ape.
 
> Guerry.csv is probably safe as its from "Essay on the Moral Statistics
> of France" published 1833.
> 
> medpar 2016's license is here:
> https://www.healthdata.gov/dataset/medpar-limited-data-set-lds-hospital
> -national
> and is listed as "Open Data Commons Open Database License"
> https://opendatacommons.org/licenses/odbl/1.0/
> 
> Duncan is the Duncan's Occupational Prestige Data from 1950.
> Couldn't find a license
> 
> starsCYG is Data for the Hertzsprung-Russell Diagram of the star
> cluster CYG OB1
> http://ugrad.stat.ubc.ca/R/library/rrcov/html/stars.html
> Couldn't find a license
> 
> Moore is from Moore, J. C., Jr. and Krupat, E. (1971) 
> Relationship between source status, authoritarianism and conformity in
> a social setting.
> Couldn't find a license

So may be we should patch the docs to ignore those data sets without a
license.

Thanks a lot for your work on this package

    Andreas.


[1] http://lists.debian.org/debian-devel/2013/09/msg00332.html 

-- 
http://fam-tille.de

Re: Python 3 Statsmodels & Pandas

Reply via email to