Thanks for the replies,
Unfortunately it's not the subject==object bug. I tried to have a look at the
extraction code to see if I could find a fix but unfortunately it's written in
a language I don't read. I have to say I think it's a shame that you chose
Scala over Java code as there must be many, many more people who would
contribute if it was written in a commonly used language such as Java. Anyway I
don't mean to rant as you are doing an amazing job with DBpedia and I sincerely
thank you for the hard work.
In terms of contributing, I'm working on some code to clean up the skos
category hierarchy; removing the cycles (not too hard) and multiple paths
(quite a pain).
I'll identify obvious errors (such as the self-referential subject==object) and
possibly will use these to correct Wikipedia directly, and hence DBpedia
(eventually).
For the less obvious errors I'm trying to develop heuristics to select the most
appropriate edge to remove.
If all goes well this might end up as a paper with a proper evaluation but I
was wondering if you'd also be interested in potentially incorporating the
outcome into DBpedia, maybe providing a cleaned_skos_categories_en.nt?
Neil
From: [email protected]
Date: Thu, 7 Mar 2013 13:20:36 +0200
To: [email protected]
CC: [email protected]
Subject: Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt
Yup, I got it wrong :)
the titles are so much alike that I got confused to think that it was a same
subject with object bug
sorry!
Dimitris
On Thu, Mar 7, 2013 at 1:10 PM, Jona Christopher Sahnwaldt <[email protected]>
wrote:
Hi Neil, Dimitris,
if I understand Neil correctly, he means that some triples are
duplicated. For example, the triple
<http://dbpedia.org/resource/Category:10th-century_Asian_people
<http://www.w3.org/2004/02/skos/core#broader>
<http://dbpedia.org/resource/Category:10th_century_in_Asia> .
appears twice in the file skos_categories_en.nt . Neil, is that correct?
On the other hand, if I correctly understand the patch that Dimitris
pointed to, it excludes triples where subject and object URI are
identical, which is a different problem.
Cheers,
JC
On Thu, Mar 7, 2013 at 9:47 AM, Dimitris Kontokostas
<[email protected]> wrote:
> Hi Neil,
>
> Thanks for the bug report, we already fixed that [1] but effects will be
> seen in the next release
>
> Best,
> Dimitris
>
> [1]
> https://github.com/dbpedia/extraction-framework/commit/2cb7d621b45cf07c1c59638e0c2cc3fc71fa0cbb
>
>
> On Wed, Mar 6, 2013 at 11:30 PM, Neil Ireson <[email protected]> wrote:
>>
>> Hi all,
>>
>> I'm just doing some processing of the skos_categories_en.nt file and
>> discovered there are a number of duplicate triples, 1937 in total.
>>
>> For example the following are the (lexicographically) first duplicated
>> lines:
>>
>> <http://dbpedia.org/resource/Category:10th-century_Asian_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:10th_century_in_Asia> .
>> <http://dbpedia.org/resource/Category:11th-century_Asian_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:11th_century_in_Asia> .
>> <http://dbpedia.org/resource/Category:12th-century_Asian_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:12th_century_in_Asia> .
>> <http://dbpedia.org/resource/Category:13th-century_Asian_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:13th_century_in_Asia> .
>> <http://dbpedia.org/resource/Category:13th-century_Byzantine_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:Byzantine_people_by_century> .
>> <http://dbpedia.org/resource/Category:13th-century_writers>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:13th-century_people_by_occupation> .
>> <http://dbpedia.org/resource/Category:14th-century_Asian_people>
>> <http://www.w3.org/2004/02/skos/core#broader>
>> <http://dbpedia.org/resource/Category:14th_century_in_Asia> .
>>
>> Is this a bug, or to be expected?
>>
>> N
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
>> endpoint security space. For insight on selecting the right partner to
>> tackle endpoint security challenges, access the full report.
>> http://p.sf.net/sfu/symantec-dev2dev
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
>
> --
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig
> Research Group: http://aksw.org
> Homepage:http://aksw.org/DimitrisKontokostas
>
> ------------------------------------------------------------------------------
> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
> endpoint security space. For insight on selecting the right partner to
> tackle endpoint security challenges, access the full report.
> http://p.sf.net/sfu/symantec-dev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion