Thanks for the replies, 

Unfortunately it's not the subject==object bug. I tried to have a look at the 
extraction code to see if I could find a fix but unfortunately it's written in 
a language I don't read. I have to say I think it's a shame that you chose 
Scala over Java code as there must be many, many more people who would 
contribute if it was written in a commonly used language such as Java. Anyway I 
don't mean to rant as you are doing an amazing job with DBpedia and I sincerely 
thank you for the hard work.

In terms of contributing, I'm working on some code to clean up the skos 
category hierarchy; removing the cycles (not too hard) and multiple paths 
(quite a pain). 

I'll identify obvious errors (such as the self-referential subject==object) and 
possibly will use these to correct Wikipedia directly, and hence DBpedia 
(eventually). 

For the less obvious errors I'm trying to develop heuristics to select the most 
appropriate edge to remove.

If all goes well this might end up as a paper with a proper evaluation but I 
was wondering if you'd also be interested in potentially incorporating the 
outcome into DBpedia, maybe providing a cleaned_skos_categories_en.nt?

Neil




From: [email protected]
Date: Thu, 7 Mar 2013 13:20:36 +0200
To: [email protected]
CC: [email protected]
Subject: Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

Yup, I got it wrong :)
the titles are so much alike that I got confused to think that it was a same 
subject with object bug

sorry!
Dimitris




On Thu, Mar 7, 2013 at 1:10 PM, Jona Christopher Sahnwaldt <[email protected]> 
wrote:


Hi Neil, Dimitris,



if I understand Neil correctly, he means that some triples are

duplicated. For example, the triple



<http://dbpedia.org/resource/Category:10th-century_Asian_people

<http://www.w3.org/2004/02/skos/core#broader>

<http://dbpedia.org/resource/Category:10th_century_in_Asia> .



appears twice in the file skos_categories_en.nt . Neil, is that correct?



On the other hand, if I correctly understand the patch that Dimitris

pointed to, it excludes triples where subject and object URI are

identical, which is a different problem.



Cheers,

JC





On Thu, Mar 7, 2013 at 9:47 AM, Dimitris Kontokostas

<[email protected]> wrote:

> Hi Neil,

>

> Thanks for the bug report, we already fixed that [1] but effects will be

> seen in the next release

>

> Best,

> Dimitris

>

> [1]

> https://github.com/dbpedia/extraction-framework/commit/2cb7d621b45cf07c1c59638e0c2cc3fc71fa0cbb



>

>

> On Wed, Mar 6, 2013 at 11:30 PM, Neil Ireson <[email protected]> wrote:

>>

>> Hi all,

>>

>> I'm just doing some processing of the skos_categories_en.nt file and

>> discovered there are a number of duplicate triples, 1937 in total.

>>

>> For example the following are the (lexicographically) first duplicated

>> lines:

>>

>> <http://dbpedia.org/resource/Category:10th-century_Asian_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:10th_century_in_Asia> .

>> <http://dbpedia.org/resource/Category:11th-century_Asian_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:11th_century_in_Asia> .

>> <http://dbpedia.org/resource/Category:12th-century_Asian_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:12th_century_in_Asia> .

>> <http://dbpedia.org/resource/Category:13th-century_Asian_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:13th_century_in_Asia> .

>> <http://dbpedia.org/resource/Category:13th-century_Byzantine_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:Byzantine_people_by_century> .

>> <http://dbpedia.org/resource/Category:13th-century_writers>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:13th-century_people_by_occupation> .

>> <http://dbpedia.org/resource/Category:14th-century_Asian_people>

>> <http://www.w3.org/2004/02/skos/core#broader>

>> <http://dbpedia.org/resource/Category:14th_century_in_Asia> .

>>

>> Is this a bug, or to be expected?

>>

>> N

>>

>>

>>

>> ------------------------------------------------------------------------------

>> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester

>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the

>> endpoint security space. For insight on selecting the right partner to

>> tackle endpoint security challenges, access the full report.

>> http://p.sf.net/sfu/symantec-dev2dev

>> _______________________________________________

>> Dbpedia-discussion mailing list

>> [email protected]

>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

>>

>

>

>

> --

> Dimitris Kontokostas

> Department of Computer Science, University of Leipzig

> Research Group: http://aksw.org

> Homepage:http://aksw.org/DimitrisKontokostas

>

> ------------------------------------------------------------------------------

> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester

> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the

> endpoint security space. For insight on selecting the right partner to

> tackle endpoint security challenges, access the full report.

> http://p.sf.net/sfu/symantec-dev2dev

> _______________________________________________

> Dbpedia-discussion mailing list

> [email protected]

> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

>



------------------------------------------------------------------------------

Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester

Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the

endpoint security space. For insight on selecting the right partner to

tackle endpoint security challenges, access the full report.

http://p.sf.net/sfu/symantec-dev2dev

_______________________________________________

Dbpedia-discussion mailing list

[email protected]

https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



-- 

Dimitris Kontokostas

Department of Computer Science, University of Leipzig

Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion                 
                          
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to