I lean towards keeping the gapminder data package "frozen". Even though it could be updated, corrected, etc.
Why? Because I and others now have a lot of teaching material built around this package. And I've learned the hard way from a few small prior changes that it leads to tons of diffs when I re-render my lessons. The vast majority of it is ignorable noise, but sometimes plots or examples that used to makes sense no longer do. And so the result or figure contradicts adjacent text. It's very hard to catch. I've always tried to be clear that gapminder is a data set for teaching and exampling data analysis. There are much better sources of the socioeconomic data and, as noticed below, even Gapminder.org has better info now. I do like the idea of adding data from 2008 -->. But ... it's not that simple. Currently I have scripts that exactly produce this data package from Gapminder.org spreadsheets I downloaded several years ago and have under version control. If I want to keep everything reproducible, I'd have to allow all the data to change, since I'm sure Gapminder's --> 2007 data has been changing all these years :( -- Jenny > On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> wrote: > > Thanks! I don’t know R well enough, but can this code be used to update the > dataset and add 2012? @Jenny (on CC)? > > Lex > >> On 23 Jan 2017, at 16:58, François Michonneau >> <[email protected]> wrote: >> >> Apologies, I forgot to include the link to the repo: >> https://github.com/jennybc/gapminder >> >> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau >> <[email protected]> wrote: >>> Hi Lex, >>> >>> The data we use for the lessons (at least for R) are actually coming >>> from the gapminder R package put together by Jenny Brian. The package >>> contains the code used to tidy the data from the spreadsheets made >>> available by the gapminder website. It might be worth putting a pull >>> request together to update the data there, and then it will be easy to >>> update the data in our lessons. >>> >>> Cheers, >>> -- François >>> >>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt >>> <[email protected]> wrote: >>>> Hi, >>>> >>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones >>>> from >>>> the python and R lessons using the gap minder data, runs from 1952 to 2007. >>>> I thought it would be nice to add the 2012 data, it being 2017, after all. >>>> >>>> So I went to what I guessed to be the original source, >>>> https://www.gapminder.org/data/ (that was easy), and checked a few of the >>>> population size numbers from the years in the datasets we use. I choose >>>> "Population, total” as dataset, which can be viewed as google sheet here. >>>> The numbers are not the same, in some cases they are quite much lower or >>>> higher, while in others they are more close. >>>> >>>> The other data sources are a bit harder to compare. There are a few >>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$ >>>> inflation-adjusted)” comes closest, but the numbers are quite a bit higher >>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off. >>>> >>>> Should we update our numbers and add 2012? This could be done with some >>>> smart webscraping, I think? >>>> >>>> Best, >>>> >>>> Lex Nederbragt >>>> >>>> _______________________________________________ >>>> Discuss mailing list >>>> [email protected] >>>> http://lists.software-carpentry.org/listinfo/discuss > _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/listinfo/discuss
