Re: [Discuss] gapminderDataFiveYear

Lex Nederbragt Mon, 06 Feb 2017 03:48:46 -0800

Hi Jenny,

In light of the replies on keeping the existing dataset, this sounds like a 
good solution to me!


        Lex

> On 25 Jan 2017, at 19:23, Jenny Bryan <[email protected]> wrote:
> 
> Actually, instead of a gapminder2 pkg, there could simply be another data 
> frame in gapminder that is expicitly allowed to stay more current, accept 
> corrections, etc. For example, I already know about some data quality 
> problems in existing data frame 
> (https://github.com/jennybc/gapminder/issues/9).
> 
> Not sure what to name it?
> 
> The evolution of an underlying csv would still be tracked via git, but I 
> could relax about requiring some perfectly formed, scripted backstory.
> 
>> On Jan 25, 2017, at 10:16 AM, Bennet Fauber <[email protected]> wrote:
>> 
>> I can kind of understand the desire to have 'more current' data
>> because it might be more appealing to students, but I also have a huge
>> amount of sympathy/empathy for Jenny's original reply about how it
>> would redound into people's prepared lessons.
>> 
>> Could an 'intermediate' or 'advanced' lesson be made that updates the
>> data, and use that as a vehicle for getting people further along?
>> 
>> Otherwise, I would concur that a gapminder2 package (or whatever year
>> is the most recent) would be a nice way for those who prefer the most
>> current data without distressing those with an investment in the old.
>> 
>> -- bennet
>> 
>> 
>> 
>> 
>> 
>> On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote:
>>> FWIW here's how the current data package is made:
>>> 
>>> https://github.com/jennybc/gapminder/tree/master/data-raw#readme
>>> 
>>> I use these scripts / this workflow in teaching as well, i.e. gapminder is 
>>> not just about that one data frame, it's also about the process that made 
>>> it.
>>> 
>>> Ridiculous though it may sound, if there's real interest in updating, 
>>> correcting, etc., we could consider making a gapminder2 package? And never 
>>> commit to having such a clean story re: data provenance and cleaning.
>>> 
>>> -- Jenny
>>> 
>>>> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote:
>>>> 
>>>> Not sure if this is relevant anymore since it seems Jenny has a much 
>>>> better handle on the original source for this data.
>>>> A few months ago I tried to track down the origins of the dataset and came 
>>>> up with these links (Not sure if I ever documented this information):
>>>> 
>>>> GDP
>>>> GDP per capita by purchasing power parities (v8)
>>>> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx
>>>> 
>>>> Population
>>>> Total Population (v1)
>>>> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx
>>>> 
>>>> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote:
>>>> I lean towards keeping the gapminder data package "frozen". Even though it 
>>>> could be updated, corrected, etc.
>>>> 
>>>> Why? Because I and others now have a lot of teaching material built around 
>>>> this package. And I've learned the hard way from a few small prior changes 
>>>> that it leads to tons of diffs when I re-render my lessons.
>>>> 
>>>> The vast majority of it is ignorable noise, but sometimes plots or 
>>>> examples that used to makes sense no longer do. And so the result or 
>>>> figure contradicts adjacent text. It's very hard to catch.
>>>> 
>>>> I've always tried to be clear that gapminder is a data set for teaching 
>>>> and exampling data analysis. There are much better sources of the 
>>>> socioeconomic data and, as noticed below, even Gapminder.org has better 
>>>> info now.
>>>> 
>>>> I do like the idea of adding data from 2008 -->. But ... it's not that 
>>>> simple. Currently I have scripts that exactly produce this data package 
>>>> from Gapminder.org spreadsheets I downloaded several years ago and have 
>>>> under version control.
>>>> 
>>>> If I want to keep everything reproducible, I'd have to allow all the data 
>>>> to change, since I'm sure Gapminder's --> 2007 data has been changing all 
>>>> these years :(
>>>> 
>>>> -- Jenny
>>>> 
>>>>> On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Thanks! I don’t know R well enough, but can this code be used to update 
>>>>> the dataset and add 2012? @Jenny (on CC)?
>>>>> 
>>>>>     Lex
>>>>> 
>>>>>> On 23 Jan 2017, at 16:58, François Michonneau 
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>> Apologies, I forgot to include the link to the repo:
>>>>>> https://github.com/jennybc/gapminder
>>>>>> 
>>>>>> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau
>>>>>> <[email protected]> wrote:
>>>>>>> Hi Lex,
>>>>>>> 
>>>>>>> The data we use for the lessons (at least for R) are actually coming
>>>>>>> from the gapminder R package put together by Jenny Brian. The package
>>>>>>> contains the code used to tidy the data from the spreadsheets made
>>>>>>> available by the gapminder website. It might be worth putting a pull
>>>>>>> request together to update the data there, and then it will be easy to
>>>>>>> update the data in our lessons.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> -- François
>>>>>>> 
>>>>>>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> The gapminderDataFiveYear data that I have on my harddisk, and the 
>>>>>>>> ones from
>>>>>>>> the python and R lessons using the gap minder data, runs from 1952 to 
>>>>>>>> 2007.
>>>>>>>> I thought it would be nice to add the 2012 data, it being 2017, after 
>>>>>>>> all.
>>>>>>>> 
>>>>>>>> So I went to what I guessed to be the original source,
>>>>>>>> https://www.gapminder.org/data/ (that was easy), and checked a few of 
>>>>>>>> the
>>>>>>>> population size numbers from the years in the datasets we use. I choose
>>>>>>>> "Population, total” as dataset, which can be viewed as google sheet 
>>>>>>>> here.
>>>>>>>> The numbers are not the same, in some cases they are quite much lower 
>>>>>>>> or
>>>>>>>> higher, while in others they are more close.
>>>>>>>> 
>>>>>>>> The other data sources are a bit harder to compare. There are a few
>>>>>>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$
>>>>>>>> inflation-adjusted)” comes closest, but the numbers are quite a bit 
>>>>>>>> higher
>>>>>>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also 
>>>>>>>> off.
>>>>>>>> 
>>>>>>>> Should we update our numbers and add 2012? This could be done with some
>>>>>>>> smart webscraping, I think?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Lex Nederbragt
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Discuss mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.software-carpentry.org/listinfo/discuss
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Discuss mailing list
>>>> [email protected]
>>>> http://lists.software-carpentry.org/listinfo/discuss
>>> 
>>> _______________________________________________
>>> Discuss mailing list
>>> [email protected]
>>> http://lists.software-carpentry.org/listinfo/discuss
> 
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] gapminderDataFiveYear

Reply via email to