On 09/26/2012 12:28 AM, josef.p...@gmail.com wrote:
> On Wed, Sep 26, 2012 at 12:05 AM, Paul Tremblay <paulhtremb...@gmail.com> 
> wrote:
>> In R, there are many default data sets one can use to both illustrate code
>> and explore the scripting language. Instead of having to fake data, one can
>> pull from meaningful data sets, created in the real world. For example, this
>> one liner actually produces a plot:
>>
>> plot(mtcars$hp~mtcars$mpg)
>>
>> where mtcars refers to a built-in data set taken from Motor Trend Magazine.
>> I don't believe matplotlib has anything similar. I have started to download
>> some of the R data sets and store them as pickles for my own use. Does
>> anyone else have any interest in creating a repository for these data sets
>> or otherwise sharing them in some way?
> Vincent converted several R datasets back to csv, that can be easily
> loaded from the web with, for example, pandas.
> http://vincentarelbundock.github.com/Rdatasets/
> The collection is a bit random.
>
> statsmodels has some datasets that we use for examples and tests
> http://statsmodels.sourceforge.net/devel/datasets/index.html
> We were always a bit slow with adding datasets because we were too
> cautious about licensing issues. But R seems to get away with
> considering most datasets to be public domain.
> We keep adding datasets to statsmodels as we need them for new models.
>
> The machine learning packages like sklearn have packaged the typical
> machine learning datasets.
>
> If you are interested, you could join up with statsmodels or with
> Vincent to expand on what's available.
>
It seems to me like contributing to (rather than duplicating) the work 
of one of these projects would be a great idea.  It would also be nice 
to add functionality in matplotlib to make it easier to download these 
things as a one-off -- obviously not exactly the same syntax as with R, 
but ideally with a single function call.

Mike

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to