Hi folks I was just trying out the new extraction framework (nice work :), and looking over the ntriple/nquad dumps, when I noticed this:
$ grep "http://dbpedia.org/resource/Neil_Gaiman" homepages_en.nq <http://dbpedia.org/resource/Neil_Gaiman> <http://xmlns.com/foaf/0.1/homepage> "http://www.neilgaiman.com/"^^<http://www.w3.org/2001/XMLSchema#anyURI> <http://en.wikipedia.org/wiki/Neil_Gaiman#absolute-line=16> . versus from the RDF/XML download page in the main site, http://dbpedia.org/page/Neil_Gaiman grep homepage ~/Downloads/Neil_Gaiman.rdf <rdf:Description rdf:about="http://dbpedia.org/resource/Neil_Gaiman"><foaf:homepage xmlns:foaf="http://xmlns.com/foaf/0.1/" rdf:resource="http://www.neilgaiman.com/"/></rdf:Description> In NTriples this is $ rapper -o ntriples ~/Downloads/Neil_Gaiman.rdf | grep homepage rapper: Parsing URI file:///Users/danbri/Downloads/Neil_Gaiman.rdf with parser rdfxml rapper: Serializing with serializer ntriples <http://dbpedia.org/resource/Neil_Gaiman> <http://xmlns.com/foaf/0.1/homepage> <http://www.neilgaiman.com/> . The latter is the correct usage of foaf:homepage; it doesn't relate a person to a xmlschema-datatyped literal, but to a document. This is so that it can have other independent properties and relationships. I found this by chance. On closer inspection, it seems to be a difference between the data generated by latest version of extractors (I downloaded and ran it last night (until my disk filled up:)), and the current dbpedia: when I look at the latest downloadables they are ok too: grep Gaiman homepages_en.nq <http://dbpedia.org/resource/Neil_Gaiman> <http://xmlns.com/foaf/0.1/homepage> <http://www.neilgaiman.com/> <http://en.wikipedia.org/wiki/Neil_Gaiman#absolute-line=16> . I haven't looked at the handling of other FOAF properties, nor of your own vocab, so I have no idea if this change is part of a bigger situation. For foaf:homepage it would be great if you could revert to the document-valued treatement of the property. cheers, Dan ps. I'm trying to grap *all* the links from wikipedia to twitter; is the external_links dump going to cover that, or they only include urls from the explicit 'external links' section of the page? running "grep -v External external_links_en.nq" I find some lines that don't trace directly to an External Links section. When I check them they're sometimes subsections of External Links, but not always. This seems good news for me. pps. (and completely offtopic, but as context for why I'm digging into this stuff...) where this gets really interesting is when we start cross-checking RDF assertions from different sites. Note that Twitter has the notion of a "verified account". So - Twitter assert via http://api.twitter.com/1/users/show.json?id=neilhimself that "verified":true, ... "url":"http://www.neilgaiman.com" ... a claim which is reciprocated in the wikipedia/dbpedia data, although indirectly. Twitter are saying that the Person controlling the twitter online account 'neilhimself' also has a homepage of http://www.neilgaiman.com/". Wikipedia says that there is a Person matching that description with various other characteristics. A lot of the apps I'm interested in don't need to do this kind of cross-check but it is nice to see that it could work, at least for well known people. This is btw the same logic Google implemented in their social graph API using XFN and FOAF - http://code.google.com/apis/socialgraph/ ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
