Re: [mySociety:public] Crowd-sourcing - critical mass

Tim Green Fri, 20 Aug 2010 07:48:18 -0700

Crowdsourcing in my head works as some sort of distribution where mostpeople only contribute a small amount, but there are a few people who domost of the heavy lifting (it wouldn't be hard to confirm this with dataon user contributions). In that way, you just need blanket coverage toget as many people to try it out in the hope of finding the keenos. I'mnot sure there's much of a snowball effect -- unless you can get peopleto tell their friends that they're having great fun correcting buildinginformation and that they should join in.

As for motivation to get involved, I think the wiki-like correctingsomething you know is wrong is a powerful entry point. Beyond that, youneed to provide easy mechanisms for someone to continue to correct otherthings in a game-like fashion (in your case a page that randomlyprovides the next uncatagorised building to tag, or the next buildingthat has been flagged as being wrong (so someone who knows that theinformation is wrong but doesn't put the effort into fixing it can stillbe of use). The selection could be randomised to focus on buildings nearthe user if they supply a postcode etc.).

I think crowdsourcing for YourNextMP (Edmund could confirm the breakdownperhaps) was a mixture of visitors/candidates/agents just finding theirconstituency page and correcting it (like I fixed a libdem candidate inmy constituency who I knew wasn't standing, while the guardian websitedidn't provide a correction mechanism so the mistake stayed up untiljust before the election) and bulk volunteers we pushed from DemocracyClub to play the points game to find a randomly (ish) selectedcandidates contact information from google. I know someone's working ona generic crowdsourcing game tool that might be of use.

I'd say the fact that the data exists, despite low quality, is a goodstart -- because it means that volunteer time to correct it produces avisible improvement while still having some use, rather than having tocreate everything from scratch. It's going to take a while to check400,000 buildings, but wiki-like crowdsourcing will focus on the entriespeople are most interested in (what's the distribution of visits acrossthe entries? a handful of popular places and the rest mostlyunvisited?), hence giving the greatest benefit for the least effort :)


Tim

On 20/08/10 15:31, Mark Goodge wrote:

This is a bit off-topic for this list, since it's nothing to do withmySociety, but the people who are most likely to be able to answer myquestion are probably here...
I was wondering what kind of critical mass of contributors you need toeffectively crowd-source data. Is there any kind of threshold ofcontributors below which it won't work at all? Obviously, the more youhave, the faster things get done, but I'm wondering if there's somekind of snowball effect whereby having a large number of contributorsencourages even more to join in?
To give some background on this, I run a website about listedbuildings[1] which (in theory) includes every listed building in GreatBritain. The original data is obtained from the three nationalheritage organisations (English Heritage, Historic Scotland and Cadw),and combined into a single database for display on the web.
[1] http://www.britishlistedbuildings.co.uk
When I created the site, I had no idea either how popular it wouldbecome (it's now far and away my most visited website), or how flaweda lot of the underlying data is - there are a lot of factual errors inthe statutory listing data. So I'm getting a lot of complaints aboutthe data quality, as well as requests for more features, the mostpopular of which is some form of building classification - forexample, being able to search specifically for specific categories ofbuildings (eg, religious buildings, schools, railway buildings, etc),or those from a particular era. Most of that information is in thedatabase, but it's not in any kind of consistent format so it's notvery amenable to automated extraction.
My thoughts, therefore, were to try to crowd-source solutions tothese. Obviously, fixing errors requires actual knowledge of thebuilding in question, so I'm not expecting rapid results from that,but classifying buildings for use in a search system merely requiressomeone to read the text of as yet uncategorised entries and then tagthem accordingly via a simple form. So that ought to be achievable,given enough contributors. The question is, how many would I need, andwhat's the best way of motivating them to contribute?
To give an idea of user interaction so far, compared to the number ofentries, there are just over 400,000 buildings in the database, and sofar I've had 365 user comments, 2,400 user-contributed photos and 109user-contributed corrections to postcodes and coordinates.
Mark



_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Re: [mySociety:public] Crowd-sourcing - critical mass

Reply via email to