Dear Matthew,

Thanks for your suggestions. 

On 15 Nov 2012, at 19:59, Matthew Gamble wrote:

> Interesting approach.  
> 
> The classification of errors seems like a reasonably complicated task for 
> crowd sourcing. Have you thought about a few gold standard examples to help 
> train new workers?  i.e. ones where you know the answer and can tell them if 
> they got it right or wrong.

In order to help users understand the errors, we have provided an example and 
description for each of them. 
> 
> You might also get better results if you present the worker with a side by 
> side comparison of the original page and the extraction. I think this would 
> help in two ways (1) make clear the point that it is about extraction 
> accuracy - i.e. they are comparing one with another and, (2) help in cases 
> such as truncated text (which appears to be one of your error types) - we 
> only know if it was truncated during extraction if we can see the source (it 
> might just have been that way in the source). 

We have provided a link to the corresponding Wikipedia page where the user can 
look at the original data and compare it with the extracted data in DBpedia. 
> 
> (I believe you can even get provenance information about the location in a 
> page a triple was extracted from so you could even line it up for them! [1]). 

Yes, that's a good idea. However, in the current version of the tool it only 
displays what a user would see while browsing DBpedia.
> 
> I might also suggest that you split each task down to a single triple at a 
> time so that each task is smaller/easier - I'm not sure(though I may be 
> wrong) that there is any benefit from showing the whole page of extracted 
> triples in one go.

Each triple is in fact separated out so that a user can only choose and specify 
individually which triple has a data quality problem. Also, it is important to 
know which resource the triple belongs to, in order to evaluate it's quality. 

> 
> Interested to see how this works out!
> 
> Best,
> Matthew
> 
> [1] http://wiki.dbpedia.org/Datasets#h18-18
> ---
> Matthew Gamble
> PhD Candidate
> School of Computer Science
> University of Manchester
> [email protected]
> 
Thanks.
Regards,
Ms. Amrapali Zaveri Gokhale

University of Leipzig - Department of Computer Science
Paulinum 618, Augustusplatz 10, 04109 Leipzig, Germany
http://aksw.org/AmrapaliZaveri

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to