Crowdsourcing in my head works as some sort of distribution where most people only contribute a small amount, but there are a few people who do most of the heavy lifting (it wouldn't be hard to confirm this with data on user contributions). In that way, you just need blanket coverage to get as many people to try it out in the hope of finding the keenos. I'm not sure there's much of a snowball effect -- unless you can get people to tell their friends that they're having great fun correcting building information and that they should join in.

As for motivation to get involved, I think the wiki-like correcting something you know is wrong is a powerful entry point. Beyond that, you need to provide easy mechanisms for someone to continue to correct other things in a game-like fashion (in your case a page that randomly provides the next uncatagorised building to tag, or the next building that has been flagged as being wrong (so someone who knows that the information is wrong but doesn't put the effort into fixing it can still be of use). The selection could be randomised to focus on buildings near the user if they supply a postcode etc.).

I think crowdsourcing for YourNextMP (Edmund could confirm the breakdown perhaps) was a mixture of visitors/candidates/agents just finding their constituency page and correcting it (like I fixed a libdem candidate in my constituency who I knew wasn't standing, while the guardian website didn't provide a correction mechanism so the mistake stayed up until just before the election) and bulk volunteers we pushed from Democracy Club to play the points game to find a randomly (ish) selected candidates contact information from google. I know someone's working on a generic crowdsourcing game tool that might be of use.

I'd say the fact that the data exists, despite low quality, is a good start -- because it means that volunteer time to correct it produces a visible improvement while still having some use, rather than having to create everything from scratch. It's going to take a while to check 400,000 buildings, but wiki-like crowdsourcing will focus on the entries people are most interested in (what's the distribution of visits across the entries? a handful of popular places and the rest mostly unvisited?), hence giving the greatest benefit for the least effort :)

Tim

On 20/08/10 15:31, Mark Goodge wrote:
This is a bit off-topic for this list, since it's nothing to do with mySociety, but the people who are most likely to be able to answer my question are probably here...

I was wondering what kind of critical mass of contributors you need to effectively crowd-source data. Is there any kind of threshold of contributors below which it won't work at all? Obviously, the more you have, the faster things get done, but I'm wondering if there's some kind of snowball effect whereby having a large number of contributors encourages even more to join in?

To give some background on this, I run a website about listed buildings[1] which (in theory) includes every listed building in Great Britain. The original data is obtained from the three national heritage organisations (English Heritage, Historic Scotland and Cadw), and combined into a single database for display on the web.

[1] http://www.britishlistedbuildings.co.uk

When I created the site, I had no idea either how popular it would become (it's now far and away my most visited website), or how flawed a lot of the underlying data is - there are a lot of factual errors in the statutory listing data. So I'm getting a lot of complaints about the data quality, as well as requests for more features, the most popular of which is some form of building classification - for example, being able to search specifically for specific categories of buildings (eg, religious buildings, schools, railway buildings, etc), or those from a particular era. Most of that information is in the database, but it's not in any kind of consistent format so it's not very amenable to automated extraction.

My thoughts, therefore, were to try to crowd-source solutions to these. Obviously, fixing errors requires actual knowledge of the building in question, so I'm not expecting rapid results from that, but classifying buildings for use in a search system merely requires someone to read the text of as yet uncategorised entries and then tag them accordingly via a simple form. So that ought to be achievable, given enough contributors. The question is, how many would I need, and what's the best way of motivating them to contribute?

To give an idea of user interaction so far, compared to the number of entries, there are just over 400,000 buildings in the database, and so far I've had 365 user comments, 2,400 user-contributed photos and 109 user-contributed corrections to postcodes and coordinates.

Mark


_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to