Hello, It seems licence of the java implementation is LGPL:
- *The libraries are open source (LGPL)*. You can adapt the libraries to your needs, and the community can spot and fix issues [1]. Is that good news? Can we just use/link to the java library without modifying its code? yours, junyue [1] http://www.rdfhdt.org/what-is-hdt/ On Tue, Jun 9, 2015 at 6:57 AM, Peter Ansell <[email protected]> wrote: > Hi Junyue, > > Sorry for any confusion that we may have caused you by not emphasising > the licensing issue as the main factor in this project, and hence you > not realising that it required an actual parser to be written (and > that you can't look at the GPL/LGPL parser for inspiration). > > We are still early and I think you should try to follow the W3C > submission to see how difficult parsing a binary format is to see > whether you want to continue or not in a week or two after trying to > write a binary parser from scratch. Don't focus on the writer at this > point if you think the parser will be enough for you. > > Once the RDF-HDT people release a newer version of the specification, > you can switch to using that, but it would be great to see if you can > get a basic parser up and running based on the older W3C submission. > To start off with you could try just parsing the header, and see how > difficult that turns out to be before deciding about the rest of the > time. > > Sorry in advance btw, this is my first time being a GSOC mentor and I > may do things wrong. > > Cheers, > > Peter > > > > On 8 June 2015 at 17:09, Junyue Wang <[email protected]> wrote: > > Hello Peter, > > > > I went through the W3C document. I think coding from scratch is too > > difficult for me. In the project proposal I submitted, Java HDT > library[1] > > is to be reused for parsing and writing hdt files. The jena integration > is > > built on top of Java HDT library as well. I reviewed the source code of > > Java HDT library, which does not strictly conform to the W3C document. If > > we follow the specification precisely, the new sesame-rio-rdfhdt module > may > > not be able to dealing with the hdt files generated by Java HDT library. > > > > I hope it's OK to stick to the original idea in the proposal. Or we may > > have problems to complete the project within the 3-month period. > > > > [1] http://www.rdfhdt.org/manual-of-the-java-hdt-library/ > > [2] http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/ > > > > yours, > > junyue > > > > > > On Mon, Jun 8, 2015 at 8:48 AM, Peter Ansell <[email protected]> > wrote: > > > >> Hi Junyue, > >> > >> You are not going to be using or linking to the existing RDF/HDT > >> implementations so their use of TripleString internally should not be > >> an issue for you and you do not need to look at the RDF/HDT Java > >> source code for this project. > >> > >> The sole reference for your implementation is the following document > >> that the RDF/HDT team submitted to the W3C: > >> > >> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/ > >> > >> Specifically, you need to implement a binary parser from scratch based > >> on the specification given in section 3: > >> > >> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/#syntax > >> > >> Cheers, > >> > >> Peter > >> > >> On 8 June 2015 at 01:43, Junyue Wang <[email protected]> wrote: > >> > Hello Peter, > >> > > >> > I've done with creating the new module and the new format. Now I'm > >> > implementing the RDFHDTParser. > >> > One question: If I search RDF HDT, it provides TripleString for each > >> > triple. TripleString contains 3 Strings for subject, predicate and > object > >> > respectively. I need to transform the Strings into Sesame Values, > which > >> may > >> > be URI, Resource, Literal or BlankNode. But I don't know before hand > >> which > >> > concrete types of Value they are. Is there a neat way to do this? > >> > > >> > I checked out ValueFactory in Sesame. It only does the transformation > for > >> > the given concrete type. > >> > > >> > yours, > >> > junyue > >> > > >> > On Sun, May 17, 2015 at 9:09 AM, Peter Ansell <[email protected] > > > >> > wrote: > >> > > >> >> Hi Junjue, > >> >> > >> >> It will be simplest to track if you fork the Marmotta repository at > >> >> GitHub and create a branch named "MARMOTTA-593". > >> >> > >> >> Add me as a collaborator to the GitHub repository. My GitHub id is > >> >> "ansell". > >> >> > >> >> The collaborators list for my fork is at: > >> >> > >> >> https://github.com/ansell/marmotta/settings/collaboration > >> >> > >> >> When you fork it, you can replace "ansell" with your GitHub id and > use > >> >> that page to add me to the list of collaborators. > >> >> > >> >> Yes, the code will be merged to Marmotta in the end. > >> >> > >> >> You should create a new module inside of marmotta-sesame-tools named > >> >> "marmotta-rio-rdfht" > >> >> > >> >> > >> >> > >> > https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools > >> >> > >> >> You will also need to add a format constant into marmotta-rio-api as > a > >> >> new folder in the following directory, similar to the current 3 > >> >> folders there: > >> >> > >> >> > >> >> > >> > https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools/marmotta-rio-api/src/main/java/org/apache/marmotta/commons/sesame/rio > >> >> > >> >> Cheers, > >> >> > >> >> Peter > >> >> > >> >> > >> >> Cheers, > >> >> > >> >> Peter > >> >> > >> >> On 16 May 2015 at 22:19, Junyue Wang <[email protected]> wrote: > >> >> > Hello Sergio, Peter, > >> >> > > >> >> > It's my honor to be a GSoC student. I appreciate your help for the > >> >> comments > >> >> > of the project proposal. > >> >> > I read the proposed methodology you pointed out. But it seems my > >> project > >> >> is > >> >> > only related to Sesame and RDF HDT, without touching the code base > of > >> >> > Marmotta. Should I fork Marmotta in github, or start a new > repository > >> >> there? > >> >> > Will my code be merged into Marmotta in the end? If so, which > module > >> of > >> >> > Marmotta? > >> >> > > >> >> > yours, > >> >> > junyue > >> >> > > >> >> > On Thu, Apr 30, 2015 at 2:41 PM, Sergio Fernández < > [email protected]> > >> >> wrote: > >> >> > > >> >> >> Hi Peter, > >> >> >> > >> >> >> On Wed, Apr 29, 2015 at 1:12 AM, Peter Ansell < > >> [email protected]> > >> >> >> wrote: > >> >> >>> > >> >> >>> Those guidelines look great to me, especially the suggestion > about > >> the > >> >> >>> branch name including the Jira issue, which I have found very > useful > >> >> >>> in all of my git-based projects. In the RDF/HDT case, and > possibly > >> in > >> >> >>> the GeoSPARQL case, the contributed code could be in the form of > a > >> new > >> >> >>> module, so there won't be much interference with the rest of the > >> >> >>> codebase during that time. However, it is still useful to > regularly > >> >> >>> merge the "develop" branch into each of the branches to keep up > to > >> >> >>> date and reduce the number of merge conflicts occurring near the > end > >> >> >>> when the students will be rushing to complete the project. > >> >> >> > >> >> >> > >> >> >> Great you like it, Peter :-) > >> >> >> > >> >> >> I expect less merge conflicts, nevertheless it's a more concrete > >> >> library; > >> >> >> with the GeoSPARQL project that workflow is much more important. > >> >> >> > >> >> >> I've just have one concern about the documentation. Last year I > had > >> >> >> formatting issues bringing that documentation into the wiki > (MoinMoin > >> >> >> syntax is not markdown, unfortunately). Do you think is better to > do > >> it > >> >> >> directly in the wiki? > >> >> >> > >> >> >> I'd love to hear comments from our students, after all you're the > >> ones > >> >> who > >> >> >> need to follow that proposed methodology. > >> >> >> > >> >> >> Cheers, > >> >> >> > >> >> >> -- > >> >> >> Sergio Fernández > >> >> >> Partner Technology Manager > >> >> >> Redlink GmbH > >> >> >> m: +43 6602747925 > >> >> >> e: [email protected] > >> >> >> w: http://redlink.co > >> >> >> > >> >> > >> >
