Actually, instead of a gapminder2 pkg, there could simply be another data frame 
in gapminder that is expicitly allowed to stay more current, accept 
corrections, etc. For example, I already know about some data quality problems 
in existing data frame (https://github.com/jennybc/gapminder/issues/9).

Not sure what to name it?

The evolution of an underlying csv would still be tracked via git, but I could 
relax about requiring some perfectly formed, scripted backstory.

> On Jan 25, 2017, at 10:16 AM, Bennet Fauber <[email protected]> wrote:
> 
> I can kind of understand the desire to have 'more current' data
> because it might be more appealing to students, but I also have a huge
> amount of sympathy/empathy for Jenny's original reply about how it
> would redound into people's prepared lessons.
> 
> Could an 'intermediate' or 'advanced' lesson be made that updates the
> data, and use that as a vehicle for getting people further along?
> 
> Otherwise, I would concur that a gapminder2 package (or whatever year
> is the most recent) would be a nice way for those who prefer the most
> current data without distressing those with an investment in the old.
> 
> -- bennet
> 
> 
> 
> 
> 
> On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote:
>> FWIW here's how the current data package is made:
>> 
>> https://github.com/jennybc/gapminder/tree/master/data-raw#readme
>> 
>> I use these scripts / this workflow in teaching as well, i.e. gapminder is 
>> not just about that one data frame, it's also about the process that made it.
>> 
>> Ridiculous though it may sound, if there's real interest in updating, 
>> correcting, etc., we could consider making a gapminder2 package? And never 
>> commit to having such a clean story re: data provenance and cleaning.
>> 
>> -- Jenny
>> 
>>> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote:
>>> 
>>> Not sure if this is relevant anymore since it seems Jenny has a much better 
>>> handle on the original source for this data.
>>> A few months ago I tried to track down the origins of the dataset and came 
>>> up with these links (Not sure if I ever documented this information):
>>> 
>>> GDP
>>> GDP per capita by purchasing power parities (v8)
>>> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx
>>> 
>>> Population
>>> Total Population (v1)
>>> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx
>>> 
>>> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote:
>>> I lean towards keeping the gapminder data package "frozen". Even though it 
>>> could be updated, corrected, etc.
>>> 
>>> Why? Because I and others now have a lot of teaching material built around 
>>> this package. And I've learned the hard way from a few small prior changes 
>>> that it leads to tons of diffs when I re-render my lessons.
>>> 
>>> The vast majority of it is ignorable noise, but sometimes plots or examples 
>>> that used to makes sense no longer do. And so the result or figure 
>>> contradicts adjacent text. It's very hard to catch.
>>> 
>>> I've always tried to be clear that gapminder is a data set for teaching and 
>>> exampling data analysis. There are much better sources of the socioeconomic 
>>> data and, as noticed below, even Gapminder.org has better info now.
>>> 
>>> I do like the idea of adding data from 2008 -->. But ... it's not that 
>>> simple. Currently I have scripts that exactly produce this data package 
>>> from Gapminder.org spreadsheets I downloaded several years ago and have 
>>> under version control.
>>> 
>>> If I want to keep everything reproducible, I'd have to allow all the data 
>>> to change, since I'm sure Gapminder's --> 2007 data has been changing all 
>>> these years :(
>>> 
>>> -- Jenny
>>> 
>>>> On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> 
>>>> wrote:
>>>> 
>>>> Thanks! I don’t know R well enough, but can this code be used to update 
>>>> the dataset and add 2012? @Jenny (on CC)?
>>>> 
>>>>      Lex
>>>> 
>>>>> On 23 Jan 2017, at 16:58, François Michonneau 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Apologies, I forgot to include the link to the repo:
>>>>> https://github.com/jennybc/gapminder
>>>>> 
>>>>> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau
>>>>> <[email protected]> wrote:
>>>>>> Hi Lex,
>>>>>> 
>>>>>> The data we use for the lessons (at least for R) are actually coming
>>>>>> from the gapminder R package put together by Jenny Brian. The package
>>>>>> contains the code used to tidy the data from the spreadsheets made
>>>>>> available by the gapminder website. It might be worth putting a pull
>>>>>> request together to update the data there, and then it will be easy to
>>>>>> update the data in our lessons.
>>>>>> 
>>>>>> Cheers,
>>>>>> -- François
>>>>>> 
>>>>>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt
>>>>>> <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones 
>>>>>>> from
>>>>>>> the python and R lessons using the gap minder data, runs from 1952 to 
>>>>>>> 2007.
>>>>>>> I thought it would be nice to add the 2012 data, it being 2017, after 
>>>>>>> all.
>>>>>>> 
>>>>>>> So I went to what I guessed to be the original source,
>>>>>>> https://www.gapminder.org/data/ (that was easy), and checked a few of 
>>>>>>> the
>>>>>>> population size numbers from the years in the datasets we use. I choose
>>>>>>> "Population, total” as dataset, which can be viewed as google sheet 
>>>>>>> here.
>>>>>>> The numbers are not the same, in some cases they are quite much lower or
>>>>>>> higher, while in others they are more close.
>>>>>>> 
>>>>>>> The other data sources are a bit harder to compare. There are a few
>>>>>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$
>>>>>>> inflation-adjusted)” comes closest, but the numbers are quite a bit 
>>>>>>> higher
>>>>>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off.
>>>>>>> 
>>>>>>> Should we update our numbers and add 2012? This could be done with some
>>>>>>> smart webscraping, I think?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Lex Nederbragt
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Discuss mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.software-carpentry.org/listinfo/discuss
>>>> 
>>> 
>>> _______________________________________________
>>> Discuss mailing list
>>> [email protected]
>>> http://lists.software-carpentry.org/listinfo/discuss
>> 
>> _______________________________________________
>> Discuss mailing list
>> [email protected]
>> http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Reply via email to