On 09/26/2012 09:33 AM, Benjamin Root wrote:
On Wed, Sep 26, 2012 at 9:10 AM, Michael Droettboom <md...@stsci.edu
<mailto:md...@stsci.edu>> wrote:
On 09/26/2012 12:28 AM, josef.p...@gmail.com
<mailto:josef.p...@gmail.com> wrote:
> On Wed, Sep 26, 2012 at 12:05 AM, Paul Tremblay
<paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
>> In R, there are many default data sets one can use to both
illustrate code
>> and explore the scripting language. Instead of having to fake
data, one can
>> pull from meaningful data sets, created in the real world. For
example, this
>> one liner actually produces a plot:
>>
>> plot(mtcars$hp~mtcars$mpg)
>>
>> where mtcars refers to a built-in data set taken from Motor
Trend Magazine.
>> I don't believe matplotlib has anything similar. I have started
to download
>> some of the R data sets and store them as pickles for my own
use. Does
>> anyone else have any interest in creating a repository for
these data sets
>> or otherwise sharing them in some way?
> Vincent converted several R datasets back to csv, that can be easily
> loaded from the web with, for example, pandas.
> http://vincentarelbundock.github.com/Rdatasets/
> The collection is a bit random.
>
> statsmodels has some datasets that we use for examples and tests
> http://statsmodels.sourceforge.net/devel/datasets/index.html
> We were always a bit slow with adding datasets because we were too
> cautious about licensing issues. But R seems to get away with
> considering most datasets to be public domain.
> We keep adding datasets to statsmodels as we need them for new
models.
>
> The machine learning packages like sklearn have packaged the typical
> machine learning datasets.
>
> If you are interested, you could join up with statsmodels or with
> Vincent to expand on what's available.
>
It seems to me like contributing to (rather than duplicating) the work
of one of these projects would be a great idea. It would also be nice
to add functionality in matplotlib to make it easier to download these
things as a one-off -- obviously not exactly the same syntax as
with R,
but ideally with a single function call.
Mike
We did have such a thing. matplotlib.cbook.get_sample_data(). I think
we got rid of it for 1.2.0?
It was removed because the server side was a moving target and would
constantly break. It was based on pulling files out of the svn (and
later git) repository, and sourceforge and github have had a habit of
changing the urls used to do so. All of the data that was there was
moved into the main repository and is now installed alongside
matplotlib, so get_sample_data() still works.
See this PR: https://github.com/matplotlib/matplotlib/pull/498
I should have mentioned it earlier, that we do have a very small set of
standard data sets included there -- but these other projects linked to
above are much better and more extensive. If we can rely on them to
have static urls over time, I think they are much better options than
anything matplotlib has had in the past.
Mike
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users