2009/1/28 Sam Smith <[email protected]>: > If you want to find > out how simple the data preparation is compared to the > checking, I'm sure Francis and Tom would love for people to > volunteer to help with the 2008 WTT zeitgeist.
I'm not sure data prep *is* simple compared to analysis. In the world of web-sourced data, an analysis throws up questions that force you to go back and look at assumptions in how the data was prepped to start with. Especially when analysing log traffic, which contains a lot of non-standardised data. John Udell wrote a series of interesting articles partly on this subject (also a nice bit of data visualisation) starting here: http://blog.jonudell.net/2009/01/13/transparency-trends/ I'll volunteer to work on the 2008 zeitgeist... sounds like fun :) seb 2009/1/28 Sam Smith <[email protected]>: > On Mon, 19 Jan 2009, Harry Metcalfe wrote: >> >> Blogged on exactly that point the other day >> >> >> http://www.thedextrousweb.com/2009/01/dcsf-statistical-releases-bbc-better-data-formats/ > > That is all technical (and there's nothing inherently > unparsable about a spreadsheet, even lots of big ones) - > context matters more. Throwing the data up on a website is > really not that hard. > > Looking at it takes time and skilled manpower. Which is > exactly what the BBC are talking about. If you want to find > out how simple the data preparation is compared to the > checking, I'm sure Francis and Tom would love for people to > volunteer to help with the 2008 WTT zeitgeist. All that data > *is* in a structured format - the best it probably could be. > But anyone who has seen that process will know that it's > really not that simple. > > > You can read, even a large number of tables, into a database > very quickly. Could the data format be better, certainly; > but that's not necessarily the biggest problem. > > It should not be more important to be first than to be right. > > > If nothing changed, you could use the same code as last > time. If things have changed, it makes no difference > whether your code reads a spreadsheet or from an API call. > You still should spend time checking before announcing that > somewhere is the best school area in Britain, or someone is > the worst MP. > > > > > Cheers > Sam > >> On Thu, 2009-01-15 at 11:24 +0000, Tom Steinberg wrote: >>> >>> It's a real shame that the media has used their skills to lobby for >>> the wrong thing. >>> >>> 24 hours is quite enough if the data is supplied in a nice format and >>> you just dump it into your pre-prepared database and run your >>> pre-prepared scripts. >>> >>> The problem - the senior management don't know this is possible, so >>> they ask for the wrong, morally dubious thing (more time, exclusive >>> access to public information to raise private profits) rather than the >>> right thing, data in a format that can be parsed and published in >>> seconds. >>> >>> best, >>> >>> Tom >>> >>> _______________________________________________ >>> developers-public mailing list >>> [email protected] >>> >>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> >> >> _______________________________________________ >> Mailing list [email protected] >> Archive, settings, or unsubscribe: >> >> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> > > -- > Doubt: In the battle between you and the world, bet on the world > > _______________________________________________ > Mailing list [email protected] > Archive, settings, or unsubscribe: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > -- skype: seb.bacon mobile: 07790 939224 _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
