Hello all,
As a master student major in semantic web, I'm very interested in the GSoC
2015 project of MARMOTTA-593 [1]. I'm made some code studies on Sesame RIO
and RDF HDT. I know how to implement from scratch the Sesame RIO
infrastructure. As to RDF HDT, here're some basic ideas of the
implementation in this project, for which your comments are very welcome:
1) RDFParser for HDT
As is shown in [2], the HDT RDFParser can search all the triples in the
HDT, and then transform each TripleString into Statement, something like:
IteratorTripleString it = hdt.search("", "", "");
while(it.hasNext()) {
TripleString ts = it.next();
... // transfrom ts into a Statement
... // sink the Statement to RDFHandler
}
In addition, the HDT RDFParser should be registered into Rio beforehand,
for a new RDFFormat, so that :
Rio.createParser(RDFFormat.HDT); // for .hdt files
2) RDFWriter for HDT
As is illustrated in [3] (hdt.HDT#saveToHDT), There are 4 steps to write
into HDT: GLOBAL, HEADER, DICTIONARY, and TRIPLES at last. So we have the
first 3 steps in HDT RDFWriter.startRDF(), with the last one in
HDT RDFWriter.handleStatement() (borrowing codes from TriplesPrivate.save()
). Nothing should be done in endRDF().
3) RDFHandler for HDT (not required)
No other RDFHandler is required for HDT. Note that RDFWriter itself is-a
RDFHandler, which is 2). But other RDFHandler is out of the scope of this
GSoC project. Right?
4) Query support for HDT (not requried)
Sesame RIO does not involve querying component (e.g. SPARQL). Therefore,
this GSoC project will not address Sesame query part for HDT. Am I correct?
Last question: this project seems just related to Sesame and RDF HDT, how
does it benefit Marmotta?
yours,
junyue
[1] https://issues.apache.org/jira/browse/MARMOTTA-593?filter=12330297
[2] http://www.rdfhdt.org/manual-of-the-java-hdt-library/
[3]
http://code.google.com/p/hdt-java/source/browse/hdt-java-core/src/main/java/org/rdfhdt/hdt/hdt/impl/HDTImpl.java