Crowdsourcing in my head works as some sort of distribution where most
people only contribute a small amount, but there are a few people who do
most of the heavy lifting (it wouldn't be hard to confirm this with data
on user contributions). In that way, you just need blanket coverage to
get as many people to try it out in the hope of finding the keenos. I'm
not sure there's much of a snowball effect -- unless you can get people
to tell their friends that they're having great fun correcting building
information and that they should join in.
As for motivation to get involved, I think the wiki-like correcting
something you know is wrong is a powerful entry point. Beyond that, you
need to provide easy mechanisms for someone to continue to correct other
things in a game-like fashion (in your case a page that randomly
provides the next uncatagorised building to tag, or the next building
that has been flagged as being wrong (so someone who knows that the
information is wrong but doesn't put the effort into fixing it can still
be of use). The selection could be randomised to focus on buildings near
the user if they supply a postcode etc.).
I think crowdsourcing for YourNextMP (Edmund could confirm the breakdown
perhaps) was a mixture of visitors/candidates/agents just finding their
constituency page and correcting it (like I fixed a libdem candidate in
my constituency who I knew wasn't standing, while the guardian website
didn't provide a correction mechanism so the mistake stayed up until
just before the election) and bulk volunteers we pushed from Democracy
Club to play the points game to find a randomly (ish) selected
candidates contact information from google. I know someone's working on
a generic crowdsourcing game tool that might be of use.
I'd say the fact that the data exists, despite low quality, is a good
start -- because it means that volunteer time to correct it produces a
visible improvement while still having some use, rather than having to
create everything from scratch. It's going to take a while to check
400,000 buildings, but wiki-like crowdsourcing will focus on the entries
people are most interested in (what's the distribution of visits across
the entries? a handful of popular places and the rest mostly
unvisited?), hence giving the greatest benefit for the least effort :)
Tim
On 20/08/10 15:31, Mark Goodge wrote:
This is a bit off-topic for this list, since it's nothing to do with
mySociety, but the people who are most likely to be able to answer my
question are probably here...
I was wondering what kind of critical mass of contributors you need to
effectively crowd-source data. Is there any kind of threshold of
contributors below which it won't work at all? Obviously, the more you
have, the faster things get done, but I'm wondering if there's some
kind of snowball effect whereby having a large number of contributors
encourages even more to join in?
To give some background on this, I run a website about listed
buildings[1] which (in theory) includes every listed building in Great
Britain. The original data is obtained from the three national
heritage organisations (English Heritage, Historic Scotland and Cadw),
and combined into a single database for display on the web.
[1] http://www.britishlistedbuildings.co.uk
When I created the site, I had no idea either how popular it would
become (it's now far and away my most visited website), or how flawed
a lot of the underlying data is - there are a lot of factual errors in
the statutory listing data. So I'm getting a lot of complaints about
the data quality, as well as requests for more features, the most
popular of which is some form of building classification - for
example, being able to search specifically for specific categories of
buildings (eg, religious buildings, schools, railway buildings, etc),
or those from a particular era. Most of that information is in the
database, but it's not in any kind of consistent format so it's not
very amenable to automated extraction.
My thoughts, therefore, were to try to crowd-source solutions to
these. Obviously, fixing errors requires actual knowledge of the
building in question, so I'm not expecting rapid results from that,
but classifying buildings for use in a search system merely requires
someone to read the text of as yet uncategorised entries and then tag
them accordingly via a simple form. So that ought to be achievable,
given enough contributors. The question is, how many would I need, and
what's the best way of motivating them to contribute?
To give an idea of user interaction so far, compared to the number of
entries, there are just over 400,000 buildings in the database, and so
far I've had 365 user comments, 2,400 user-contributed photos and 109
user-contributed corrections to postcodes and coordinates.
Mark
_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public