Hello Peter,

I went through the W3C document. I think coding from scratch is too
difficult for me. In the project proposal I submitted, Java HDT library[1]
is to be reused for parsing and writing hdt files. The jena integration is
built on top of Java HDT library as well. I reviewed the source code of
Java HDT library, which does not strictly conform to the W3C document. If
we follow the specification precisely, the new sesame-rio-rdfhdt module may
not be able to dealing with the hdt files generated by Java HDT library.

I hope it's OK to stick to the original idea in the proposal. Or we may
have problems to complete the project within the 3-month period.

[1] http://www.rdfhdt.org/manual-of-the-java-hdt-library/
[2] http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/

yours,
junyue


On Mon, Jun 8, 2015 at 8:48 AM, Peter Ansell <[email protected]> wrote:

> Hi Junyue,
>
> You are not going to be using or linking to the existing RDF/HDT
> implementations so their use of TripleString internally should not be
> an issue for you and you do not need to look at the RDF/HDT Java
> source code for this project.
>
> The sole reference for your implementation is the following document
> that the RDF/HDT team submitted to the W3C:
>
> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/
>
> Specifically, you need to implement a binary parser from scratch based
> on the specification given in section 3:
>
> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/#syntax
>
> Cheers,
>
> Peter
>
> On 8 June 2015 at 01:43, Junyue Wang <[email protected]> wrote:
> > Hello Peter,
> >
> > I've done with creating the new module and the new format. Now I'm
> > implementing the RDFHDTParser.
> > One question: If I search RDF HDT, it provides TripleString for each
> > triple. TripleString contains 3 Strings for subject, predicate and object
> > respectively. I need to transform the Strings into Sesame Values, which
> may
> > be URI, Resource, Literal or BlankNode. But I don't know before hand
> which
> > concrete types of Value they are. Is there a neat way to do this?
> >
> > I checked out ValueFactory in Sesame. It only does the transformation for
> > the given concrete type.
> >
> > yours,
> > junyue
> >
> > On Sun, May 17, 2015 at 9:09 AM, Peter Ansell <[email protected]>
> > wrote:
> >
> >> Hi Junjue,
> >>
> >> It will be simplest to track if you fork the Marmotta repository at
> >> GitHub and create a branch named "MARMOTTA-593".
> >>
> >> Add me as a collaborator to the GitHub repository. My GitHub id is
> >> "ansell".
> >>
> >> The collaborators list for my fork is at:
> >>
> >> https://github.com/ansell/marmotta/settings/collaboration
> >>
> >> When you fork it, you can replace "ansell" with your GitHub id and use
> >> that page to add me to the list of collaborators.
> >>
> >> Yes, the code will be merged to Marmotta in the end.
> >>
> >> You should create a new module inside of marmotta-sesame-tools named
> >> "marmotta-rio-rdfht"
> >>
> >>
> >>
> https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools
> >>
> >> You will also need to add a format constant into marmotta-rio-api as a
> >> new folder in the following directory, similar to the current 3
> >> folders there:
> >>
> >>
> >>
> https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools/marmotta-rio-api/src/main/java/org/apache/marmotta/commons/sesame/rio
> >>
> >> Cheers,
> >>
> >> Peter
> >>
> >>
> >> Cheers,
> >>
> >> Peter
> >>
> >> On 16 May 2015 at 22:19, Junyue Wang <[email protected]> wrote:
> >> > Hello Sergio, Peter,
> >> >
> >> > It's my honor to be a GSoC student. I appreciate your help for the
> >> comments
> >> > of the project proposal.
> >> > I read the proposed methodology you pointed out. But it seems my
> project
> >> is
> >> > only related to Sesame and RDF HDT, without touching the code base of
> >> > Marmotta. Should I fork Marmotta in github, or start a new repository
> >> there?
> >> > Will my code be merged into Marmotta in the end? If so, which module
> of
> >> > Marmotta?
> >> >
> >> > yours,
> >> > junyue
> >> >
> >> > On Thu, Apr 30, 2015 at 2:41 PM, Sergio Fernández <[email protected]>
> >> wrote:
> >> >
> >> >> Hi Peter,
> >> >>
> >> >> On Wed, Apr 29, 2015 at 1:12 AM, Peter Ansell <
> [email protected]>
> >> >> wrote:
> >> >>>
> >> >>> Those guidelines look great to me, especially the suggestion about
> the
> >> >>> branch name including the Jira issue, which I have found very useful
> >> >>> in all of my git-based projects. In the RDF/HDT case, and possibly
> in
> >> >>> the GeoSPARQL case, the contributed code could be in the form of a
> new
> >> >>> module, so there won't be much interference with the rest of the
> >> >>> codebase during that time. However, it is still useful to regularly
> >> >>> merge the "develop" branch into each of the branches to keep up to
> >> >>> date and reduce the number of merge conflicts occurring near the end
> >> >>> when the students will be rushing to complete the project.
> >> >>
> >> >>
> >> >> Great you like it, Peter :-)
> >> >>
> >> >> I expect less merge conflicts, nevertheless it's a more concrete
> >> library;
> >> >> with the GeoSPARQL project that workflow is much more important.
> >> >>
> >> >> I've just have one concern about the documentation. Last year I had
> >> >> formatting issues bringing that documentation into the wiki (MoinMoin
> >> >> syntax is not markdown, unfortunately). Do you think is better to do
> it
> >> >> directly in the wiki?
> >> >>
> >> >> I'd love to hear comments from our students, after all you're the
> ones
> >> who
> >> >> need to follow that proposed methodology.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> --
> >> >> Sergio Fernández
> >> >> Partner Technology Manager
> >> >> Redlink GmbH
> >> >> m: +43 6602747925
> >> >> e: [email protected]
> >> >> w: http://redlink.co
> >> >>
> >>
>

Reply via email to