Harry says: "standardised format, on a web page"Sam says: "but if it has changed, you need to check it"
I'm trying to do something about EU farm subsidy data, and I can vouch for that government agencies don't mind the data receiver. Worst is that the cleanup I'll do is not going back into their system, since their "system" is a bunch of excel files, so they can't do any typical database operation. (Otherwise I could just send them my SQL, and they'd get some technician to help them -- maybe even give them an "auto-cleanup" script in an Access database which they can use at the click of a button. They'd get a preview and an OK/Cancel choice.) Back on topic: So the small problem is getting the overall structure right, and the big problem is data quality? Even if the standard would dictate fields like Firstname, Surname - the data still comes in with things mixed up? And the current solution is that each mass-media needs to spend days to clean up the data? This seems wasteful - it must be possible to get the data right in the first place? Assuming that the newspaper readers only look at the top and bottom of the lists, the article author should probably phone these schools in advance and allow them to comment on their placement. If schools can't comment within a day, maybe they can publish school comments some days later. What else is this "checking" media have to do, which couldn't more efficiently be handled centrally? Thanks for reading /Simon On Thu, Jan 29, 2009 at 12:26 AM, Sam Smith <[email protected]> wrote: > On Mon, 19 Jan 2009, Harry Metcalfe wrote: > >> Blogged on exactly that point the other day >> >> >> http://www.thedextrousweb.com/2009/01/dcsf-statistical-releases-bbc-better-data-formats/ >> > > That is all technical (and there's nothing inherently > unparsable about a spreadsheet, even lots of big ones) - > context matters more. Throwing the data up on a website is > really not that hard. > > Looking at it takes time and skilled manpower. Which is > exactly what the BBC are talking about. If you want to find > out how simple the data preparation is compared to the > checking, I'm sure Francis and Tom would love for people to > volunteer to help with the 2008 WTT zeitgeist. All that data > *is* in a structured format - the best it probably could be. > But anyone who has seen that process will know that it's > really not that simple. > > > You can read, even a large number of tables, into a database > very quickly. Could the data format be better, certainly; > but that's not necessarily the biggest problem. > > It should not be more important to be first than to be right. > > > If nothing changed, you could use the same code as last > time. If things have changed, it makes no difference > whether your code reads a spreadsheet or from an API call. > You still should spend time checking before announcing that > somewhere is the best school area in Britain, or someone is > the worst MP. > > > > > Cheers > Sam > > On Thu, 2009-01-15 at 11:24 +0000, Tom Steinberg wrote: >> >>> It's a real shame that the media has used their skills to lobby for >>> the wrong thing. >>> >>> 24 hours is quite enough if the data is supplied in a nice format and >>> you just dump it into your pre-prepared database and run your >>> pre-prepared scripts. >>> >>> The problem - the senior management don't know this is possible, so >>> they ask for the wrong, morally dubious thing (more time, exclusive >>> access to public information to raise private profits) rather than the >>> right thing, data in a format that can be parsed and published in >>> seconds. >>> >>> best, >>> >>> Tom >>> >>> _______________________________________________ >>> developers-public mailing list >>> [email protected] >>> >>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >>> >> >> >> _______________________________________________ >> Mailing list [email protected] >> Archive, settings, or unsubscribe: >> >> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> >> > -- > Doubt: In the battle between you and the world, bet on the world > > _______________________________________________ > Mailing list [email protected] > Archive, settings, or unsubscribe: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >
_______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
