FWIW here's how the current data package is made:

https://github.com/jennybc/gapminder/tree/master/data-raw#readme

I use these scripts / this workflow in teaching as well, i.e. gapminder is not 
just about that one data frame, it's also about the process that made it.

Ridiculous though it may sound, if there's real interest in updating, 
correcting, etc., we could consider making a gapminder2 package? And never 
commit to having such a clean story re: data provenance and cleaning.

-- Jenny

> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote:
> 
> Not sure if this is relevant anymore since it seems Jenny has a much better 
> handle on the original source for this data.
> A few months ago I tried to track down the origins of the dataset and came up 
> with these links (Not sure if I ever documented this information):
> 
> GDP
> GDP per capita by purchasing power parities (v8)
> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx
> 
> Population
> Total Population (v1)
> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx
> 
> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote:
> I lean towards keeping the gapminder data package "frozen". Even though it 
> could be updated, corrected, etc.
> 
> Why? Because I and others now have a lot of teaching material built around 
> this package. And I've learned the hard way from a few small prior changes 
> that it leads to tons of diffs when I re-render my lessons.
> 
> The vast majority of it is ignorable noise, but sometimes plots or examples 
> that used to makes sense no longer do. And so the result or figure 
> contradicts adjacent text. It's very hard to catch.
> 
> I've always tried to be clear that gapminder is a data set for teaching and 
> exampling data analysis. There are much better sources of the socioeconomic 
> data and, as noticed below, even Gapminder.org has better info now.
> 
> I do like the idea of adding data from 2008 -->. But ... it's not that 
> simple. Currently I have scripts that exactly produce this data package from 
> Gapminder.org spreadsheets I downloaded several years ago and have under 
> version control.
> 
> If I want to keep everything reproducible, I'd have to allow all the data to 
> change, since I'm sure Gapminder's --> 2007 data has been changing all these 
> years :(
> 
> -- Jenny
> 
> > On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> 
> > wrote:
> >
> > Thanks! I don’t know R well enough, but can this code be used to update the 
> > dataset and add 2012? @Jenny (on CC)?
> >
> >       Lex
> >
> >> On 23 Jan 2017, at 16:58, François Michonneau 
> >> <[email protected]> wrote:
> >>
> >> Apologies, I forgot to include the link to the repo:
> >> https://github.com/jennybc/gapminder
> >>
> >> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau
> >> <[email protected]> wrote:
> >>> Hi Lex,
> >>>
> >>> The data we use for the lessons (at least for R) are actually coming
> >>> from the gapminder R package put together by Jenny Brian. The package
> >>> contains the code used to tidy the data from the spreadsheets made
> >>> available by the gapminder website. It might be worth putting a pull
> >>> request together to update the data there, and then it will be easy to
> >>> update the data in our lessons.
> >>>
> >>> Cheers,
> >>> -- François
> >>>
> >>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt
> >>> <[email protected]> wrote:
> >>>> Hi,
> >>>>
> >>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones 
> >>>> from
> >>>> the python and R lessons using the gap minder data, runs from 1952 to 
> >>>> 2007.
> >>>> I thought it would be nice to add the 2012 data, it being 2017, after 
> >>>> all.
> >>>>
> >>>> So I went to what I guessed to be the original source,
> >>>> https://www.gapminder.org/data/ (that was easy), and checked a few of the
> >>>> population size numbers from the years in the datasets we use. I choose
> >>>> "Population, total” as dataset, which can be viewed as google sheet here.
> >>>> The numbers are not the same, in some cases they are quite much lower or
> >>>> higher, while in others they are more close.
> >>>>
> >>>> The other data sources are a bit harder to compare. There are a few
> >>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$
> >>>> inflation-adjusted)” comes closest, but the numbers are quite a bit 
> >>>> higher
> >>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off.
> >>>>
> >>>> Should we update our numbers and add 2012? This could be done with some
> >>>> smart webscraping, I think?
> >>>>
> >>>> Best,
> >>>>
> >>>> Lex Nederbragt
> >>>>
> >>>> _______________________________________________
> >>>> Discuss mailing list
> >>>> [email protected]
> >>>> http://lists.software-carpentry.org/listinfo/discuss
> >
> 
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Reply via email to