Re: [Discuss] gapminderDataFiveYear

Bennet Fauber Wed, 25 Jan 2017 10:17:41 -0800

I can kind of understand the desire to have 'more current' data
because it might be more appealing to students, but I also have a huge
amount of sympathy/empathy for Jenny's original reply about how it
would redound into people's prepared lessons.


Could an 'intermediate' or 'advanced' lesson be made that updates the
data, and use that as a vehicle for getting people further along?

Otherwise, I would concur that a gapminder2 package (or whatever year
is the most recent) would be a nice way for those who prefer the most
current data without distressing those with an investment in the old.

-- bennet





On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote:
> FWIW here's how the current data package is made:
>
> https://github.com/jennybc/gapminder/tree/master/data-raw#readme
>
> I use these scripts / this workflow in teaching as well, i.e. gapminder is 
> not just about that one data frame, it's also about the process that made it.
>
> Ridiculous though it may sound, if there's real interest in updating, 
> correcting, etc., we could consider making a gapminder2 package? And never 
> commit to having such a clean story re: data provenance and cleaning.
>
> -- Jenny
>
>> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote:
>>
>> Not sure if this is relevant anymore since it seems Jenny has a much better 
>> handle on the original source for this data.
>> A few months ago I tried to track down the origins of the dataset and came 
>> up with these links (Not sure if I ever documented this information):
>>
>> GDP
>> GDP per capita by purchasing power parities (v8)
>> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx
>>
>> Population
>> Total Population (v1)
>> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx
>>
>> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote:
>> I lean towards keeping the gapminder data package "frozen". Even though it 
>> could be updated, corrected, etc.
>>
>> Why? Because I and others now have a lot of teaching material built around 
>> this package. And I've learned the hard way from a few small prior changes 
>> that it leads to tons of diffs when I re-render my lessons.
>>
>> The vast majority of it is ignorable noise, but sometimes plots or examples 
>> that used to makes sense no longer do. And so the result or figure 
>> contradicts adjacent text. It's very hard to catch.
>>
>> I've always tried to be clear that gapminder is a data set for teaching and 
>> exampling data analysis. There are much better sources of the socioeconomic 
>> data and, as noticed below, even Gapminder.org has better info now.
>>
>> I do like the idea of adding data from 2008 -->. But ... it's not that 
>> simple. Currently I have scripts that exactly produce this data package from 
>> Gapminder.org spreadsheets I downloaded several years ago and have under 
>> version control.
>>
>> If I want to keep everything reproducible, I'd have to allow all the data to 
>> change, since I'm sure Gapminder's --> 2007 data has been changing all these 
>> years :(
>>
>> -- Jenny
>>
>> > On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]> 
>> > wrote:
>> >
>> > Thanks! I don’t know R well enough, but can this code be used to update 
>> > the dataset and add 2012? @Jenny (on CC)?
>> >
>> >       Lex
>> >
>> >> On 23 Jan 2017, at 16:58, François Michonneau 
>> >> <[email protected]> wrote:
>> >>
>> >> Apologies, I forgot to include the link to the repo:
>> >> https://github.com/jennybc/gapminder
>> >>
>> >> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau
>> >> <[email protected]> wrote:
>> >>> Hi Lex,
>> >>>
>> >>> The data we use for the lessons (at least for R) are actually coming
>> >>> from the gapminder R package put together by Jenny Brian. The package
>> >>> contains the code used to tidy the data from the spreadsheets made
>> >>> available by the gapminder website. It might be worth putting a pull
>> >>> request together to update the data there, and then it will be easy to
>> >>> update the data in our lessons.
>> >>>
>> >>> Cheers,
>> >>> -- François
>> >>>
>> >>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt
>> >>> <[email protected]> wrote:
>> >>>> Hi,
>> >>>>
>> >>>> The gapminderDataFiveYear data that I have on my harddisk, and the ones 
>> >>>> from
>> >>>> the python and R lessons using the gap minder data, runs from 1952 to 
>> >>>> 2007.
>> >>>> I thought it would be nice to add the 2012 data, it being 2017, after 
>> >>>> all.
>> >>>>
>> >>>> So I went to what I guessed to be the original source,
>> >>>> https://www.gapminder.org/data/ (that was easy), and checked a few of 
>> >>>> the
>> >>>> population size numbers from the years in the datasets we use. I choose
>> >>>> "Population, total” as dataset, which can be viewed as google sheet 
>> >>>> here.
>> >>>> The numbers are not the same, in some cases they are quite much lower or
>> >>>> higher, while in others they are more close.
>> >>>>
>> >>>> The other data sources are a bit harder to compare. There are a few
>> >>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$
>> >>>> inflation-adjusted)” comes closest, but the numbers are quite a bit 
>> >>>> higher
>> >>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also off.
>> >>>>
>> >>>> Should we update our numbers and add 2012? This could be done with some
>> >>>> smart webscraping, I think?
>> >>>>
>> >>>> Best,
>> >>>>
>> >>>> Lex Nederbragt
>> >>>>
>> >>>> _______________________________________________
>> >>>> Discuss mailing list
>> >>>> [email protected]
>> >>>> http://lists.software-carpentry.org/listinfo/discuss
>> >
>>
>> _______________________________________________
>> Discuss mailing list
>> [email protected]
>> http://lists.software-carpentry.org/listinfo/discuss
>
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/listinfo/discuss
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] gapminderDataFiveYear

Reply via email to