Dear Matthew,
Thanks for your suggestions.
On 15 Nov 2012, at 19:59, Matthew Gamble wrote:
> Interesting approach.
>
> The classification of errors seems like a reasonably complicated task for
> crowd sourcing. Have you thought about a few gold standard examples to help
> train new workers? i.e. ones where you know the answer and can tell them if
> they got it right or wrong.
In order to help users understand the errors, we have provided an example and
description for each of them.
>
> You might also get better results if you present the worker with a side by
> side comparison of the original page and the extraction. I think this would
> help in two ways (1) make clear the point that it is about extraction
> accuracy - i.e. they are comparing one with another and, (2) help in cases
> such as truncated text (which appears to be one of your error types) - we
> only know if it was truncated during extraction if we can see the source (it
> might just have been that way in the source).
We have provided a link to the corresponding Wikipedia page where the user can
look at the original data and compare it with the extracted data in DBpedia.
>
> (I believe you can even get provenance information about the location in a
> page a triple was extracted from so you could even line it up for them! [1]).
Yes, that's a good idea. However, in the current version of the tool it only
displays what a user would see while browsing DBpedia.
>
> I might also suggest that you split each task down to a single triple at a
> time so that each task is smaller/easier - I'm not sure(though I may be
> wrong) that there is any benefit from showing the whole page of extracted
> triples in one go.
Each triple is in fact separated out so that a user can only choose and specify
individually which triple has a data quality problem. Also, it is important to
know which resource the triple belongs to, in order to evaluate it's quality.
>
> Interested to see how this works out!
>
> Best,
> Matthew
>
> [1] http://wiki.dbpedia.org/Datasets#h18-18
> ---
> Matthew Gamble
> PhD Candidate
> School of Computer Science
> University of Manchester
> [email protected]
>
Thanks.
Regards,
Ms. Amrapali Zaveri Gokhale
University of Leipzig - Department of Computer Science
Paulinum 618, Augustusplatz 10, 04109 Leipzig, Germany
http://aksw.org/AmrapaliZaveri
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion