I am a fan of the SPARQL result set format whenever people want to express
tuples of nodes:
http://www.w3.org/TR/sparql11-results-csv-tsv/
I think it’s more standard than Turtle, and it is as efficient as you’ll get
unless you want a binary format.
This file can be processed with simple streaming tools like awk or even passed
into something like Pig. If you want to load the facts into the triple store
you could just toss out the relevance rating or filter only facts where the
relevance rating is 9 or above. If you wanted to produce the kind of RDF you
suggest, you could do that too. You could also md5sum the triples and stuff
the relevance data in a key-value store where it won’t add load to the triple
store.
From: Alessio Palmero Aprosio
Sent: Monday, June 10, 2013 11:15 AM
To: dbpedia-discussion
Subject: [Dbpedia-discussion] Airpedia resource
Dear DBpedia community,
I am a PhD student from Fondazione Bruno Kessler [1] in Trento and I’m working
with my team on Airpedia [2], which is a semantic resource based on machine
learning techniques that aims to extend the coverage of DBpedia on classes
(and, in a second step, on properties).
A draft version of the resource is available on our website. we are currently
working on releasing it to the Semantic Web Community, and investigating on the
best RDF format to use.
Actually, we use a simple CSV format. For example:
#ID Class Relevance
140132 Eukaryote 10
140132 Animal 10
140132 Fish 10
140132 Species 10
140137 OlympicResult 8
140143 Eukaryote 10
140143 Amphibian 10
140143 Animal 10
140143 Species 10
The ID column refers to a WikiData ID, and can be solved on the WikiData
website on the link http://wikidata.org/wiki/Q<ID>; the Class column is the
guessed DBpedia class; the Relevance column is our confidence about the class
(from 7 to 10, being a k-NN voting, k=10). It is really easy for us to retrieve
the DBpedia ID given the WikiData ID.
Which is, in your opinion, the best way to represent this data in RDF, keeping
in mind that we want to differentiate our triples from the original DBpedia
ones and we want the relevance to be preserved?
We have in mind the folowing candidate solutions.
(“air” is our RDF namespace)
Solution 1 (string concatenation)
a.. ID air:type Class .
b.. ID_Class air:confidence Relevance .
c.. sameAses
For example:
140132 Eukaryote 10
140132 Animal 10
140132 Fish 10
140132 Species 10
becomes:
<http://airpedia.org/resource/140132> <http://airpedia.org/vocab/01/#type>
<http://dbpedia.org/ontology/Eukaryote> .
<http://airpedia.org/resource/140132_Eukaryote>
<http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132> <http://airpedia.org/vocab/01/#type>
<http://dbpedia.org/ontology/Animal> .
<http://airpedia.org/resource/140132_Animal>
<http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132> <http://airpedia.org/vocab/01/#type>
<http://dbpedia.org/ontology/Fish> .
<http://airpedia.org/resource/140132_Fish>
<http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132> <http://airpedia.org/vocab/01/#type>
<http://dbpedia.org/ontology/Species> .
<http://airpedia.org/resource/140132_Species>
<http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132> owl:sameAs
<http://dbpedia.org/resource/Big_skate> .
<http://airpedia.org/resource/140132> owl:sameAs
<http://ca.dbpedia.org/resource/Raja_binoculata> .
...
Solution 2 (blank nodes)
a.. ID air:isClassified bNode
b.. bNode air:type Class
c.. bNode air:confidence Relevance
d.. sameAses
For example:
140132 Eukaryote 10
140132 Animal 10
140132 Fish 10
140132 Species 10
becomes:
<http://airpedia.org/resource/140132>
<http://airpedia.org/vocab/01/#isClassified> _:1 .
_:1 <http://airpedia.org/vocab/01/#type>
<http://dbpedia.org/ontology/Eukaryote> .
_:1 <http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132>
<http://airpedia.org/vocab/01/#isClassified> _:2 .
_:2 <http://airpedia.org/vocab/01/#type> <http://dbpedia.org/ontology/Fish> .
_:2 <http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132>
<http://airpedia.org/vocab/01/#isClassified> _:3 .
_:3 <http://airpedia.org/vocab/01/#type> <http://dbpedia.org/ontology/Animal> .
_:3 <http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132>
<http://airpedia.org/vocab/01/#isClassified> _:4 .
_:4 <http://airpedia.org/vocab/01/#type> <http://dbpedia.org/ontology/Species> .
_:4 <http://airpedia.org/vocab/01/#confidence> “10”^^xsd:int .
<http://airpedia.org/resource/140132> owl:sameAs
<http://dbpedia.org/resource/Big_skate> .
<http://airpedia.org/resource/140132> owl:sameAs
<http://ca.dbpedia.org/resource/Raja_binoculata> .
…
While waiting for your suggestions, we finish the classification and make the
CSV available on our website [2].
Thank you!
Best,
Alessio
[1] http://www.fbk.eu
[2] http://www.airpedia.org
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
--------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion