Hi Dan, answers below: Dan wrote: > Rob wrote: >> >> Who's Lobbying launched this week >> >> >> >> >> > > I'd also love to see the underlying data when it's ready, in whatever > hackable format's easiest for you. CSV, RDFa, JSON, whatever :)
Making data available is on the todo list. > How many of the entities in the data do you have Wikipedia IDs for? 752 out of 1,510 organisations haven't yet been matched to Wikipedia, the list is here: http://whoslobbying.com/uk/uncategorised > The incoming data formats sound like a huge headache. How are you > finding Google Refine for the cleanup? Can you say a bit more about your > workflow - is most of your custom cleaning happening mostly > before the data hits Refine? I'll do a post about workflow later. Essentially it's: 1. FOI the data to remind them to publish (twice so far this year to each department) 2. Find where the data is published 3. Scrape/parse the data into csv - including splitting participant organisation names onto separate lines 4. Reconcile entities via Google Refine/Freebase's reconciliation service 5. Add normalised names eg CBI vs Confederation of British Industry 6. Load into Rails app via ActiveRecord I need to add an interface for people to report data, like CountCulture has done on the local government suppliers pages at OpenlyLocal.com cheers, Rob > cheers, > > Dan > > _______________________________________________ > Mailing list [email protected] > Archive, settings, or unsubscribe: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
_______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
