Hi Sergio and Peter: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/junyuew/5629499534213120
Please have a look at the above proposal. I composed it from our short discussions. Thank you! Time is urgent. Your quick comments are appreciated. yours, junyue On Thu, Mar 26, 2015 at 6:59 AM, Peter Ansell <[email protected]> wrote: > Just one note, don't copy any code out of SPARQL-BED, as it is AGPL > licensed which Apache is not compatible with. Not sure what the > guidelines are for using it as a reference though. If you want to be > on the safe side with Apache licensing, you could just look at the > internal Sesame Sail implementations: > > > https://bitbucket.org/openrdf/sesame/src/db49126a8cf12c420df57d65deb843707c166651/core/sail/?at=master > > Cheers, > > Peter > > On 26 March 2015 at 09:56, Peter Ansell <[email protected]> wrote: > > Hi Junyue, > > > > Thanks for your interest in the project. See my comments inline below. > > > > On 20 March 2015 at 03:38, Junyue Wang <[email protected]> wrote: > >> Hello all, > >> > >> As a master student major in semantic web, I'm very interested in the > GSoC > >> 2015 project of MARMOTTA-593 [1]. I'm made some code studies on Sesame > RIO > >> and RDF HDT. I know how to implement from scratch the Sesame RIO > >> infrastructure. As to RDF HDT, here're some basic ideas of the > >> implementation in this project, for which your comments are very > welcome: > >> > >> 1) RDFParser for HDT > >> As is shown in [2], the HDT RDFParser can search all the triples in the > >> HDT, and then transform each TripleString into Statement, something > like: > >> IteratorTripleString it = hdt.search("", "", ""); > >> while(it.hasNext()) { > >> TripleString ts = it.next(); > >> ... // transfrom ts into a Statement > >> ... // sink the Statement to RDFHandler > >> } > > > > That looks good to me. > > > >> In addition, the HDT RDFParser should be registered into Rio beforehand, > >> for a new RDFFormat, so that : > >> Rio.createParser(RDFFormat.HDT); // for .hdt files > > > > Sesame is setup so that you can add your own formats without having to > > get a constant added to RDFFormat. Of course, in the long term we will > > get a constant added for HDT to RDFFormat, but in the shortterm, you > > can create your own definition of it locally. > > > > Registering the parser is done using META-INF/services/ files that > > link to RDFParserFactory and RDFWriterFactory classes. See the > > following examples for RDF/XML: > > > > > https://bitbucket.org/openrdf/sesame/src/db49126a8cf12c420df57d65deb843707c166651/core/rio/rdfxml/src/main/resources/META-INF/services/?at=master > > > > Once you create the META-INF/services files, > > Rio.createParser(HDTFormat.HDT) should work (as long as you used that > > constant as the key for the RDFParserFactory/etc. > > > >> 2) RDFWriter for HDT > >> As is illustrated in [3] (hdt.HDT#saveToHDT), There are 4 steps to write > >> into HDT: GLOBAL, HEADER, DICTIONARY, and TRIPLES at last. So we have > the > >> first 3 steps in HDT RDFWriter.startRDF(), with the last one in > >> HDT RDFWriter.handleStatement() (borrowing codes from > TriplesPrivate.save() > >> ). Nothing should be done in endRDF(). > >> > >> 3) RDFHandler for HDT (not required) > >> No other RDFHandler is required for HDT. Note that RDFWriter itself is-a > >> RDFHandler, which is 2). But other RDFHandler is out of the scope of > this > >> GSoC project. Right? > > > > Yes, you are correct, once you have an RDFWriter and RDFParser the > > input/output section will be complete. > > > >> 4) Query support for HDT (not requried) > >> Sesame RIO does not involve querying component (e.g. SPARQL). Therefore, > >> this GSoC project will not address Sesame query part for HDT. Am I > correct? > > > > Query support would be done by implementing the Sail interface, which > > can then be queried using SPARQL by placing the SailRepository wrapper > > on top of it. > > > > One example of a custom extended Sail that you may use as a reference > > is an interface for the BED format that Jerven Bolleman created, > > although if it doesn't exactly fit your case, feel free to ask for > > other advice: > > > > > https://github.com/JervenBolleman/sparql-bed/tree/master/sparql-bed/src/main/java/ch/isbsib/sparql/bed > > > >> Last question: this project seems just related to Sesame and RDF HDT, > how > >> does it benefit Marmotta? > > > > Marmotta benefits from now supporting the HDT format for both input > > and output. The RDF community generally picks concrete formats based > > on the best candidate for a particular task, so HDT may be more > > suitable than N-Quads for bulk data for some tasks, but N-Quads can be > > processed in a streaming fashion and can compress relatively well > > using streaming compresison if necessary. Comparatively, hand-edited > > RDF files are generally done in Turtle these days, although there are > > still quite a few RDF/XML hand edited files, possibly because there > > are many examples available for that format. > > > > Thanks, > > > > Peter >
