I can kind of understand the desire to have 'more current' data because it might be more appealing to students, but I also have a huge amount of sympathy/empathy for Jenny's original reply about how it would redound into people's prepared lessons.
Could an 'intermediate' or 'advanced' lesson be made that updates the data, and use that as a vehicle for getting people further along? Otherwise, I would concur that a gapminder2 package (or whatever year is the most recent) would be a nice way for those who prefer the most current data without distressing those with an investment in the old. -- bennet On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote: > FWIW here's how the current data package is made: > > https://github.com/jennybc/gapminder/tree/master/data-raw#readme > > I use these scripts / this workflow in teaching as well, i.e. gapminder is > not just about that one data frame, it's also about the process that made it. > > Ridiculous though it may sound, if there's real interest in updating, > correcting, etc., we could consider making a gapminder2 package? And never > commit to having such a clean story re: data provenance and cleaning. > > -- Jenny > >> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote: >> >> Not sure if this is relevant anymore since it seems Jenny has a much better >> handle on the original source for this data. >> A few months ago I tried to track down the origins of the dataset and came >> up with these links (Not sure if I ever documented this information): >> >> GDP >> GDP per capita by purchasing power parities (v8) >> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx >> >> Population >> Total Population (v1) >> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx >> >> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote: >> I lean towards keeping the gapminder data package "frozen". Even though it >> could be updated, corrected, etc. >> >> Why? Because I and others now have a lot of teaching material built around >> this package. And I've learned the hard way from a few small prior changes >> that it leads to tons of diffs when I re-render my lessons. >> >> The vast majority of it is ignorable noise, but sometimes plots or examples >> that used to makes sense no longer do. And so the result or figure >> contradicts adjacent text. It's very hard to catch. >> >> I've always tried to be clear that gapminder is a data set for teaching and >> exampling data analysis. There are much better sources of the socioeconomic >> data and, as noticed below, even Gapminder.org has better info now. >> >> I do like the idea of adding data from 2008 -->. But ... it's not that >> simple. Currently I have scripts that exactly produce this data package from >> Gapminder.org spreadsheets I downloaded several years ago and have under >> version control. >> >> If I want to keep everything reproducible, I'd have to allow all the data to >> change, since I'm sure Gapminder's --> 2007 data has been changing all these >> years :( >> >> -- Jenny >> >> > On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> >> > wrote: >> > >> > Thanks! I don’t know R well enough, but can this code be used to update >> > the dataset and add 2012? @Jenny (on CC)? >> > >> > Lex >> > >> >> On 23 Jan 2017, at 16:58, François Michonneau >> >> <[email protected]> wrote: >> >> >> >> Apologies, I forgot to include the link to the repo: >> >> https://github.com/jennybc/gapminder >> >> >> >> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau >> >> <[email protected]> wrote: >> >>> Hi Lex, >> >>> >> >>> The data we use for the lessons (at least for R) are actually coming >> >>> from the gapminder R package put together by Jenny Brian. The package >> >>> contains the code used to tidy the data from the spreadsheets made >> >>> available by the gapminder website. It might be worth putting a pull >> >>> request together to update the data there, and then it will be easy to >> >>> update the data in our lessons. >> >>> >> >>> Cheers, >> >>> -- François >> >>> >> >>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt >> >>> <[email protected]> wrote: >> >>>> Hi, >> >>>> >> >>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones >> >>>> from >> >>>> the python and R lessons using the gap minder data, runs from 1952 to >> >>>> 2007. >> >>>> I thought it would be nice to add the 2012 data, it being 2017, after >> >>>> all. >> >>>> >> >>>> So I went to what I guessed to be the original source, >> >>>> https://www.gapminder.org/data/ (that was easy), and checked a few of >> >>>> the >> >>>> population size numbers from the years in the datasets we use. I choose >> >>>> "Population, total” as dataset, which can be viewed as google sheet >> >>>> here. >> >>>> The numbers are not the same, in some cases they are quite much lower or >> >>>> higher, while in others they are more close. >> >>>> >> >>>> The other data sources are a bit harder to compare. There are a few >> >>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$ >> >>>> inflation-adjusted)” comes closest, but the numbers are quite a bit >> >>>> higher >> >>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off. >> >>>> >> >>>> Should we update our numbers and add 2012? This could be done with some >> >>>> smart webscraping, I think? >> >>>> >> >>>> Best, >> >>>> >> >>>> Lex Nederbragt >> >>>> >> >>>> _______________________________________________ >> >>>> Discuss mailing list >> >>>> [email protected] >> >>>> http://lists.software-carpentry.org/listinfo/discuss >> > >> >> _______________________________________________ >> Discuss mailing list >> [email protected] >> http://lists.software-carpentry.org/listinfo/discuss > > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/listinfo/discuss _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/listinfo/discuss
