Hi Jenny,
In light of the replies on keeping the existing dataset, this sounds like a
good solution to me!
Lex
> On 25 Jan 2017, at 19:23, Jenny Bryan <[email protected]> wrote:
>
> Actually, instead of a gapminder2 pkg, there could simply be another data
> frame in gapminder that is expicitly allowed to stay more current, accept
> corrections, etc. For example, I already know about some data quality
> problems in existing data frame
> (https://github.com/jennybc/gapminder/issues/9).
>
> Not sure what to name it?
>
> The evolution of an underlying csv would still be tracked via git, but I
> could relax about requiring some perfectly formed, scripted backstory.
>
>> On Jan 25, 2017, at 10:16 AM, Bennet Fauber <[email protected]> wrote:
>>
>> I can kind of understand the desire to have 'more current' data
>> because it might be more appealing to students, but I also have a huge
>> amount of sympathy/empathy for Jenny's original reply about how it
>> would redound into people's prepared lessons.
>>
>> Could an 'intermediate' or 'advanced' lesson be made that updates the
>> data, and use that as a vehicle for getting people further along?
>>
>> Otherwise, I would concur that a gapminder2 package (or whatever year
>> is the most recent) would be a nice way for those who prefer the most
>> current data without distressing those with an investment in the old.
>>
>> -- bennet
>>
>>
>>
>>
>>
>> On Wed, Jan 25, 2017 at 12:26 PM, Jenny Bryan <[email protected]> wrote:
>>> FWIW here's how the current data package is made:
>>>
>>> https://github.com/jennybc/gapminder/tree/master/data-raw#readme
>>>
>>> I use these scripts / this workflow in teaching as well, i.e. gapminder is
>>> not just about that one data frame, it's also about the process that made
>>> it.
>>>
>>> Ridiculous though it may sound, if there's real interest in updating,
>>> correcting, etc., we could consider making a gapminder2 package? And never
>>> commit to having such a clean story re: data provenance and cleaning.
>>>
>>> -- Jenny
>>>
>>>> On Jan 25, 2017, at 6:59 AM, Tom Wright <[email protected]> wrote:
>>>>
>>>> Not sure if this is relevant anymore since it seems Jenny has a much
>>>> better handle on the original source for this data.
>>>> A few months ago I tried to track down the origins of the dataset and came
>>>> up with these links (Not sure if I ever documented this information):
>>>>
>>>> GDP
>>>> GDP per capita by purchasing power parities (v8)
>>>> https://www.gapminder.org/wp-content/uploads/2008/10/gapdata001-1.xlsx
>>>>
>>>> Population
>>>> Total Population (v1)
>>>> https://www.gapminder.org/documentation/documentation/gapdata003%20old%202011.xlsx
>>>>
>>>> On Wed, 25 Jan 2017 at 09:07 Jenny Bryan <[email protected]> wrote:
>>>> I lean towards keeping the gapminder data package "frozen". Even though it
>>>> could be updated, corrected, etc.
>>>>
>>>> Why? Because I and others now have a lot of teaching material built around
>>>> this package. And I've learned the hard way from a few small prior changes
>>>> that it leads to tons of diffs when I re-render my lessons.
>>>>
>>>> The vast majority of it is ignorable noise, but sometimes plots or
>>>> examples that used to makes sense no longer do. And so the result or
>>>> figure contradicts adjacent text. It's very hard to catch.
>>>>
>>>> I've always tried to be clear that gapminder is a data set for teaching
>>>> and exampling data analysis. There are much better sources of the
>>>> socioeconomic data and, as noticed below, even Gapminder.org has better
>>>> info now.
>>>>
>>>> I do like the idea of adding data from 2008 -->. But ... it's not that
>>>> simple. Currently I have scripts that exactly produce this data package
>>>> from Gapminder.org spreadsheets I downloaded several years ago and have
>>>> under version control.
>>>>
>>>> If I want to keep everything reproducible, I'd have to allow all the data
>>>> to change, since I'm sure Gapminder's --> 2007 data has been changing all
>>>> these years :(
>>>>
>>>> -- Jenny
>>>>
>>>>> On Jan 24, 2017, at 6:21 AM, Lex Nederbragt <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Thanks! I don’t know R well enough, but can this code be used to update
>>>>> the dataset and add 2012? @Jenny (on CC)?
>>>>>
>>>>> Lex
>>>>>
>>>>>> On 23 Jan 2017, at 16:58, François Michonneau
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Apologies, I forgot to include the link to the repo:
>>>>>> https://github.com/jennybc/gapminder
>>>>>>
>>>>>> On Mon, Jan 23, 2017 at 7:56 AM, François Michonneau
>>>>>> <[email protected]> wrote:
>>>>>>> Hi Lex,
>>>>>>>
>>>>>>> The data we use for the lessons (at least for R) are actually coming
>>>>>>> from the gapminder R package put together by Jenny Brian. The package
>>>>>>> contains the code used to tidy the data from the spreadsheets made
>>>>>>> available by the gapminder website. It might be worth putting a pull
>>>>>>> request together to update the data there, and then it will be easy to
>>>>>>> update the data in our lessons.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -- François
>>>>>>>
>>>>>>> On Mon, Jan 23, 2017 at 7:20 AM, Lex Nederbragt
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The gapminderDataFiveYear data that I have on my harddisk, and the
>>>>>>>> ones from
>>>>>>>> the python and R lessons using the gap minder data, runs from 1952 to
>>>>>>>> 2007.
>>>>>>>> I thought it would be nice to add the 2012 data, it being 2017, after
>>>>>>>> all.
>>>>>>>>
>>>>>>>> So I went to what I guessed to be the original source,
>>>>>>>> https://www.gapminder.org/data/ (that was easy), and checked a few of
>>>>>>>> the
>>>>>>>> population size numbers from the years in the datasets we use. I choose
>>>>>>>> "Population, total” as dataset, which can be viewed as google sheet
>>>>>>>> here.
>>>>>>>> The numbers are not the same, in some cases they are quite much lower
>>>>>>>> or
>>>>>>>> higher, while in others they are more close.
>>>>>>>>
>>>>>>>> The other data sources are a bit harder to compare. There are a few
>>>>>>>> GDP/Capita datasets, I think "Income per person (GDP/capita, PPP$
>>>>>>>> inflation-adjusted)” comes closest, but the numbers are quite a bit
>>>>>>>> higher
>>>>>>>> than in ‘our’ dataset. "Life expectancy (years)” is close, but also
>>>>>>>> off.
>>>>>>>>
>>>>>>>> Should we update our numbers and add 2012? This could be done with some
>>>>>>>> smart webscraping, I think?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Lex Nederbragt
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Discuss mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.software-carpentry.org/listinfo/discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> Discuss mailing list
>>>> [email protected]
>>>> http://lists.software-carpentry.org/listinfo/discuss
>>>
>>> _______________________________________________
>>> Discuss mailing list
>>> [email protected]
>>> http://lists.software-carpentry.org/listinfo/discuss
>
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/listinfo/discuss
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss