Actually, instead of a gapminder2 pkg, there could simply be another data frame in gapminder that is expicitly allowed to stay more current, accept corrections, etc. For example, I already know about some data quality problems in existing data frame (https://github.com/jennybc/gapminder/issues/9).
Not sure what to name it? The evolution of an underlying csv would still be tracked via git, but I could relax about requiring some perfectly formed, scripted backstory. > On Jan 25, 2017, at 10:16 AM, Bennet Fauber <[email protected]> wrote: > > I can kind of understand the desire to have 'more current' data > because it might be more appealing to students, but I also have a huge > amount of sympathy/empathy for Jenny's original reply about how it > would redound into people's prepared lessons. > > Could an 'intermediate' or 'advanced' lesson be made that updates the > data, and use that as a vehicle for getting people further along? > > Otherwise, I would concur that a gapminder2 package (or whatever year > is the most recent) would be a nice way for those who prefer the most > current data without distressing those with an investment in the old. > > -- bennet > > > > > > On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote: >> FWIW here's how the current data package is made: >> >> https://github.com/jennybc/gapminder/tree/master/data-raw#readme >> >> I use these scripts / this workflow in teaching as well, i.e. gapminder is >> not just about that one data frame, it's also about the process that made it. >> >> Ridiculous though it may sound, if there's real interest in updating, >> correcting, etc., we could consider making a gapminder2 package? And never >> commit to having such a clean story re: data provenance and cleaning. >> >> -- Jenny >> >>> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote: >>> >>> Not sure if this is relevant anymore since it seems Jenny has a much better >>> handle on the original source for this data. >>> A few months ago I tried to track down the origins of the dataset and came >>> up with these links (Not sure if I ever documented this information): >>> >>> GDP >>> GDP per capita by purchasing power parities (v8) >>> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx >>> >>> Population >>> Total Population (v1) >>> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx >>> >>> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote: >>> I lean towards keeping the gapminder data package "frozen". Even though it >>> could be updated, corrected, etc. >>> >>> Why? Because I and others now have a lot of teaching material built around >>> this package. And I've learned the hard way from a few small prior changes >>> that it leads to tons of diffs when I re-render my lessons. >>> >>> The vast majority of it is ignorable noise, but sometimes plots or examples >>> that used to makes sense no longer do. And so the result or figure >>> contradicts adjacent text. It's very hard to catch. >>> >>> I've always tried to be clear that gapminder is a data set for teaching and >>> exampling data analysis. There are much better sources of the socioeconomic >>> data and, as noticed below, even Gapminder.org has better info now. >>> >>> I do like the idea of adding data from 2008 -->. But ... it's not that >>> simple. Currently I have scripts that exactly produce this data package >>> from Gapminder.org spreadsheets I downloaded several years ago and have >>> under version control. >>> >>> If I want to keep everything reproducible, I'd have to allow all the data >>> to change, since I'm sure Gapminder's --> 2007 data has been changing all >>> these years :( >>> >>> -- Jenny >>> >>>> On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> >>>> wrote: >>>> >>>> Thanks! I don’t know R well enough, but can this code be used to update >>>> the dataset and add 2012? @Jenny (on CC)? >>>> >>>> Lex >>>> >>>>> On 23 Jan 2017, at 16:58, François Michonneau >>>>> <[email protected]> wrote: >>>>> >>>>> Apologies, I forgot to include the link to the repo: >>>>> https://github.com/jennybc/gapminder >>>>> >>>>> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau >>>>> <[email protected]> wrote: >>>>>> Hi Lex, >>>>>> >>>>>> The data we use for the lessons (at least for R) are actually coming >>>>>> from the gapminder R package put together by Jenny Brian. The package >>>>>> contains the code used to tidy the data from the spreadsheets made >>>>>> available by the gapminder website. It might be worth putting a pull >>>>>> request together to update the data there, and then it will be easy to >>>>>> update the data in our lessons. >>>>>> >>>>>> Cheers, >>>>>> -- François >>>>>> >>>>>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt >>>>>> <[email protected]> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones >>>>>>> from >>>>>>> the python and R lessons using the gap minder data, runs from 1952 to >>>>>>> 2007. >>>>>>> I thought it would be nice to add the 2012 data, it being 2017, after >>>>>>> all. >>>>>>> >>>>>>> So I went to what I guessed to be the original source, >>>>>>> https://www.gapminder.org/data/ (that was easy), and checked a few of >>>>>>> the >>>>>>> population size numbers from the years in the datasets we use. I choose >>>>>>> "Population, total” as dataset, which can be viewed as google sheet >>>>>>> here. >>>>>>> The numbers are not the same, in some cases they are quite much lower or >>>>>>> higher, while in others they are more close. >>>>>>> >>>>>>> The other data sources are a bit harder to compare. There are a few >>>>>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$ >>>>>>> inflation-adjusted)” comes closest, but the numbers are quite a bit >>>>>>> higher >>>>>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off. >>>>>>> >>>>>>> Should we update our numbers and add 2012? This could be done with some >>>>>>> smart webscraping, I think? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lex Nederbragt >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Discuss mailing list >>>>>>> [email protected] >>>>>>> http://lists.software-carpentry.org/listinfo/discuss >>>> >>> >>> _______________________________________________ >>> Discuss mailing list >>> [email protected] >>> http://lists.software-carpentry.org/listinfo/discuss >> >> _______________________________________________ >> Discuss mailing list >> [email protected] >> http://lists.software-carpentry.org/listinfo/discuss _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/listinfo/discuss
