On 9/26/12 10:15 AM, Michael Droettboom wrote:
On 09/26/2012 09:33 AM, Benjamin Root wrote:
On Wed, Sep 26, 2012 at 9:10 AM, Michael Droettboom <md...@stsci.edu
<mailto:md...@stsci.edu>> wrote:
On 09/26/2012 12:28 AM, josef.p...@gmail.com
<mailto:josef.p...@gmail.com> wrote:
> On Wed, Sep 26, 2012 at 12:05 AM, Paul Tremblay
<paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
>> In R, there are many default data sets one can use to both
illustrate code
>> and explore the scripting language. Instead of having to fake
data, one can
>> pull from meaningful data sets, created in the real world. For
example, this
>> one liner actually produces a plot:
>>
>> plot(mtcars$hp~mtcars$mpg)
>>
>> where mtcars refers to a built-in data set taken from Motor
Trend Magazine.
>> I don't believe matplotlib has anything similar. I have
started to download
>> some of the R data sets and store them as pickles for my own
use. Does
>> anyone else have any interest in creating a repository for
these data sets
>> or otherwise sharing them in some way?
> Vincent converted several R datasets back to csv, that can be
easily
> loaded from the web with, for example, pandas.
> http://vincentarelbundock.github.com/Rdatasets/
> The collection is a bit random.
>
> statsmodels has some datasets that we use for examples and tests
> http://statsmodels.sourceforge.net/devel/datasets/index.html
> We were always a bit slow with adding datasets because we were too
> cautious about licensing issues. But R seems to get away with
> considering most datasets to be public domain.
> We keep adding datasets to statsmodels as we need them for new
models.
>
> The machine learning packages like sklearn have packaged the
typical
> machine learning datasets.
>
> If you are interested, you could join up with statsmodels or with
> Vincent to expand on what's available.
>
It seems to me like contributing to (rather than duplicating) the
work
of one of these projects would be a great idea. It would also be
nice
to add functionality in matplotlib to make it easier to download
these
things as a one-off -- obviously not exactly the same syntax as
with R,
but ideally with a single function call.
Mike
We did have such a thing. matplotlib.cbook.get_sample_data(). I
think we got rid of it for 1.2.0?
It was removed because the server side was a moving target and would
constantly break. It was based on pulling files out of the svn (and
later git) repository, and sourceforge and github have had a habit of
changing the urls used to do so. All of the data that was there was
moved into the main repository and is now installed alongside
matplotlib, so get_sample_data() still works.
See this PR: https://github.com/matplotlib/matplotlib/pull/498
I should have mentioned it earlier, that we do have a very small set
of standard data sets included there -- but these other projects
linked to above are much better and more extensive. If we can rely on
them to have static urls over time, I think they are much better
options than anything matplotlib has had in the past.
Mike
Drawing on other posts, it is conceivable to download both the R sets
and the stats models sets and include them in
site-packages/matplotlib/mpl-data/sample_data/? I understand that
pulling data sets not in this directory creates problems because of
moving URLs, but why even try to do a web pull when the data can exists
in a reliable place?
I suppose one might raise reasonable objections to my suggestion, but at
any rate, it doesn't seem I can add anything else to either project,
since they both seem complete. I see only a small though significant
problem with the R data sets in that it leaves out the header of the
first column because of the structure of R data frames. Python needs
this header.
Paul
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users