Hi folks

I was just trying out the new extraction framework (nice work :), and
looking over the ntriple/nquad dumps, when I noticed this:


$ grep "http://dbpedia.org/resource/Neil_Gaiman"; homepages_en.nq

<http://dbpedia.org/resource/Neil_Gaiman>
<http://xmlns.com/foaf/0.1/homepage>
"http://www.neilgaiman.com/"^^<http://www.w3.org/2001/XMLSchema#anyURI>
<http://en.wikipedia.org/wiki/Neil_Gaiman#absolute-line=16> .

versus from the RDF/XML download page in the main site,
http://dbpedia.org/page/Neil_Gaiman

grep homepage ~/Downloads/Neil_Gaiman.rdf

<rdf:Description
rdf:about="http://dbpedia.org/resource/Neil_Gaiman";><foaf:homepage
xmlns:foaf="http://xmlns.com/foaf/0.1/";
rdf:resource="http://www.neilgaiman.com/"/></rdf:Description>

In NTriples this is
$ rapper -o ntriples ~/Downloads/Neil_Gaiman.rdf | grep homepage

rapper: Parsing URI file:///Users/danbri/Downloads/Neil_Gaiman.rdf
with parser rdfxml
rapper: Serializing with serializer ntriples
<http://dbpedia.org/resource/Neil_Gaiman>
<http://xmlns.com/foaf/0.1/homepage> <http://www.neilgaiman.com/> .


The latter is the correct usage of foaf:homepage; it doesn't relate a
person to a xmlschema-datatyped literal, but to a document. This is so
that it can have other independent properties and relationships.

I found this by chance.  On closer inspection, it seems to be a
difference between the data generated by latest version of extractors
(I downloaded and ran it last night (until my disk filled up:)), and
the current dbpedia: when I look at the latest downloadables they are
ok too:

grep Gaiman homepages_en.nq
<http://dbpedia.org/resource/Neil_Gaiman>
<http://xmlns.com/foaf/0.1/homepage> <http://www.neilgaiman.com/>
<http://en.wikipedia.org/wiki/Neil_Gaiman#absolute-line=16> .


I haven't looked at the handling of other FOAF properties, nor of your
own vocab, so I have no idea if this change is part of a bigger
situation. For foaf:homepage it would be great if you could revert to
the document-valued treatement of the property.

cheers,

Dan

ps. I'm trying to grap *all* the links from wikipedia to twitter; is
the external_links dump going to cover that, or they only include urls
from the explicit 'external links' section of the page? running "grep
-v External external_links_en.nq" I find some lines that don't trace
directly to an External Links section. When I check them they're
sometimes subsections of External Links, but not always. This seems
good news for me.

pps.  (and completely offtopic, but as context for why I'm digging
into this stuff...) where this gets really interesting is when we
start cross-checking RDF assertions from different sites. Note that
Twitter has the notion of a "verified account". So -
Twitter assert via
http://api.twitter.com/1/users/show.json?id=neilhimself  that
"verified":true, ... "url":"http://www.neilgaiman.com"; ... a claim
which is reciprocated in the wikipedia/dbpedia data, although
indirectly. Twitter are saying that the Person controlling the twitter
online account 'neilhimself' also has a homepage of
http://www.neilgaiman.com/";. Wikipedia says that there is a Person
matching that description with various other characteristics. A lot of
the apps I'm interested in don't need to do this kind of cross-check
but it is nice to see that it could work, at least for well known
people. This is btw the same logic Google implemented in their social
graph API using XFN and FOAF -
http://code.google.com/apis/socialgraph/

------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to