Hi Dan, answers below:

Dan wrote:
> Rob wrote:
>> 
>> Who's Lobbying launched this week
>> 
>> 
>> 
>> 
>> 
> 
> I'd also love to see the underlying data when it's ready, in whatever
> hackable format's easiest for you. CSV, RDFa, JSON, whatever :)

Making data available is on the todo list.

> How many of the entities in the data do you have Wikipedia IDs for?

752 out of 1,510 organisations haven't yet been matched to Wikipedia, the list 
is here:

http://whoslobbying.com/uk/uncategorised

> The incoming data formats sound like a huge headache. How are you
> finding Google Refine for the cleanup? Can you say a bit more about your 
> workflow - is most of your custom cleaning happening mostly
> before the data hits Refine?

I'll do a post about workflow later. Essentially it's:
1. FOI the data to remind them to publish (twice so far this year to each 
department)
2. Find where the data is published
3. Scrape/parse the data into csv - including splitting participant 
organisation names onto separate lines
4. Reconcile entities via Google Refine/Freebase's reconciliation service
5. Add normalised names eg CBI vs Confederation of British Industry
6. Load into Rails app via ActiveRecord

I need to add an interface for people to report data, like CountCulture has 
done on the local government suppliers pages at OpenlyLocal.com

cheers,
Rob

> cheers,
> 
> Dan
> 
> _______________________________________________
> Mailing list [email protected]
> Archive, settings, or unsubscribe:
> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to