On 9/26/12 10:15 AM, Michael Droettboom wrote:
On 09/26/2012 09:33 AM, Benjamin Root wrote:


On Wed, Sep 26, 2012 at 9:10 AM, Michael Droettboom <md...@stsci.edu <mailto:md...@stsci.edu>> wrote:

    On 09/26/2012 12:28 AM, josef.p...@gmail.com
    <mailto:josef.p...@gmail.com> wrote:
    > On Wed, Sep 26, 2012 at 12:05 AM, Paul Tremblay
    <paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
    >> In R, there are many default data sets one can use to both
    illustrate code
    >> and explore the scripting language. Instead of having to fake
    data, one can
    >> pull from meaningful data sets, created in the real world. For
    example, this
    >> one liner actually produces a plot:
    >>
    >> plot(mtcars$hp~mtcars$mpg)
    >>
    >> where mtcars refers to a built-in data set taken from Motor
    Trend Magazine.
    >> I don't believe matplotlib has anything similar. I have
    started to download
    >> some of the R data sets and store them as pickles for my own
    use. Does
    >> anyone else have any interest in creating a repository for
    these data sets
    >> or otherwise sharing them in some way?
    > Vincent converted several R datasets back to csv, that can be
    easily
    > loaded from the web with, for example, pandas.
    > http://vincentarelbundock.github.com/Rdatasets/
    > The collection is a bit random.
    >
    > statsmodels has some datasets that we use for examples and tests
    > http://statsmodels.sourceforge.net/devel/datasets/index.html
    > We were always a bit slow with adding datasets because we were too
    > cautious about licensing issues. But R seems to get away with
    > considering most datasets to be public domain.
    > We keep adding datasets to statsmodels as we need them for new
    models.
    >
    > The machine learning packages like sklearn have packaged the
    typical
    > machine learning datasets.
    >
    > If you are interested, you could join up with statsmodels or with
    > Vincent to expand on what's available.
    >
    It seems to me like contributing to (rather than duplicating) the
    work
    of one of these projects would be a great idea.  It would also be
    nice
    to add functionality in matplotlib to make it easier to download
    these
    things as a one-off -- obviously not exactly the same syntax as
    with R,
    but ideally with a single function call.

    Mike


We did have such a thing. matplotlib.cbook.get_sample_data(). I think we got rid of it for 1.2.0?
It was removed because the server side was a moving target and would constantly break. It was based on pulling files out of the svn (and later git) repository, and sourceforge and github have had a habit of changing the urls used to do so. All of the data that was there was moved into the main repository and is now installed alongside matplotlib, so get_sample_data() still works.

See this PR: https://github.com/matplotlib/matplotlib/pull/498

I should have mentioned it earlier, that we do have a very small set of standard data sets included there -- but these other projects linked to above are much better and more extensive. If we can rely on them to have static urls over time, I think they are much better options than anything matplotlib has had in the past.

Mike
Drawing on other posts, it is conceivable to download both the R sets and the stats models sets and include them in site-packages/matplotlib/mpl-data/sample_data/? I understand that pulling data sets not in this directory creates problems because of moving URLs, but why even try to do a web pull when the data can exists in a reliable place?

I suppose one might raise reasonable objections to my suggestion, but at any rate, it doesn't seem I can add anything else to either project, since they both seem complete. I see only a small though significant problem with the R data sets in that it leaves out the header of the first column because of the structure of R data frames. Python needs this header.

Paul
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to