Hello Peter, I went through the W3C document. I think coding from scratch is too difficult for me. In the project proposal I submitted, Java HDT library[1] is to be reused for parsing and writing hdt files. The jena integration is built on top of Java HDT library as well. I reviewed the source code of Java HDT library, which does not strictly conform to the W3C document. If we follow the specification precisely, the new sesame-rio-rdfhdt module may not be able to dealing with the hdt files generated by Java HDT library.
I hope it's OK to stick to the original idea in the proposal. Or we may have problems to complete the project within the 3-month period. [1] http://www.rdfhdt.org/manual-of-the-java-hdt-library/ [2] http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/ yours, junyue On Mon, Jun 8, 2015 at 8:48 AM, Peter Ansell <[email protected]> wrote: > Hi Junyue, > > You are not going to be using or linking to the existing RDF/HDT > implementations so their use of TripleString internally should not be > an issue for you and you do not need to look at the RDF/HDT Java > source code for this project. > > The sole reference for your implementation is the following document > that the RDF/HDT team submitted to the W3C: > > http://www.w3.org/Submission/2011/SUBM-HDT-20110330/ > > Specifically, you need to implement a binary parser from scratch based > on the specification given in section 3: > > http://www.w3.org/Submission/2011/SUBM-HDT-20110330/#syntax > > Cheers, > > Peter > > On 8 June 2015 at 01:43, Junyue Wang <[email protected]> wrote: > > Hello Peter, > > > > I've done with creating the new module and the new format. Now I'm > > implementing the RDFHDTParser. > > One question: If I search RDF HDT, it provides TripleString for each > > triple. TripleString contains 3 Strings for subject, predicate and object > > respectively. I need to transform the Strings into Sesame Values, which > may > > be URI, Resource, Literal or BlankNode. But I don't know before hand > which > > concrete types of Value they are. Is there a neat way to do this? > > > > I checked out ValueFactory in Sesame. It only does the transformation for > > the given concrete type. > > > > yours, > > junyue > > > > On Sun, May 17, 2015 at 9:09 AM, Peter Ansell <[email protected]> > > wrote: > > > >> Hi Junjue, > >> > >> It will be simplest to track if you fork the Marmotta repository at > >> GitHub and create a branch named "MARMOTTA-593". > >> > >> Add me as a collaborator to the GitHub repository. My GitHub id is > >> "ansell". > >> > >> The collaborators list for my fork is at: > >> > >> https://github.com/ansell/marmotta/settings/collaboration > >> > >> When you fork it, you can replace "ansell" with your GitHub id and use > >> that page to add me to the list of collaborators. > >> > >> Yes, the code will be merged to Marmotta in the end. > >> > >> You should create a new module inside of marmotta-sesame-tools named > >> "marmotta-rio-rdfht" > >> > >> > >> > https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools > >> > >> You will also need to add a format constant into marmotta-rio-api as a > >> new folder in the following directory, similar to the current 3 > >> folders there: > >> > >> > >> > https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools/marmotta-rio-api/src/main/java/org/apache/marmotta/commons/sesame/rio > >> > >> Cheers, > >> > >> Peter > >> > >> > >> Cheers, > >> > >> Peter > >> > >> On 16 May 2015 at 22:19, Junyue Wang <[email protected]> wrote: > >> > Hello Sergio, Peter, > >> > > >> > It's my honor to be a GSoC student. I appreciate your help for the > >> comments > >> > of the project proposal. > >> > I read the proposed methodology you pointed out. But it seems my > project > >> is > >> > only related to Sesame and RDF HDT, without touching the code base of > >> > Marmotta. Should I fork Marmotta in github, or start a new repository > >> there? > >> > Will my code be merged into Marmotta in the end? If so, which module > of > >> > Marmotta? > >> > > >> > yours, > >> > junyue > >> > > >> > On Thu, Apr 30, 2015 at 2:41 PM, Sergio Fernández <[email protected]> > >> wrote: > >> > > >> >> Hi Peter, > >> >> > >> >> On Wed, Apr 29, 2015 at 1:12 AM, Peter Ansell < > [email protected]> > >> >> wrote: > >> >>> > >> >>> Those guidelines look great to me, especially the suggestion about > the > >> >>> branch name including the Jira issue, which I have found very useful > >> >>> in all of my git-based projects. In the RDF/HDT case, and possibly > in > >> >>> the GeoSPARQL case, the contributed code could be in the form of a > new > >> >>> module, so there won't be much interference with the rest of the > >> >>> codebase during that time. However, it is still useful to regularly > >> >>> merge the "develop" branch into each of the branches to keep up to > >> >>> date and reduce the number of merge conflicts occurring near the end > >> >>> when the students will be rushing to complete the project. > >> >> > >> >> > >> >> Great you like it, Peter :-) > >> >> > >> >> I expect less merge conflicts, nevertheless it's a more concrete > >> library; > >> >> with the GeoSPARQL project that workflow is much more important. > >> >> > >> >> I've just have one concern about the documentation. Last year I had > >> >> formatting issues bringing that documentation into the wiki (MoinMoin > >> >> syntax is not markdown, unfortunately). Do you think is better to do > it > >> >> directly in the wiki? > >> >> > >> >> I'd love to hear comments from our students, after all you're the > ones > >> who > >> >> need to follow that proposed methodology. > >> >> > >> >> Cheers, > >> >> > >> >> -- > >> >> Sergio Fernández > >> >> Partner Technology Manager > >> >> Redlink GmbH > >> >> m: +43 6602747925 > >> >> e: [email protected] > >> >> w: http://redlink.co > >> >> > >> >
