Hi Junyue,

Sorry for any confusion that we may have caused you by not emphasising
the licensing issue as the main factor in this project, and hence you
not realising that it required an actual parser to be written (and
that you can't look at the GPL/LGPL parser for inspiration).

We are still early and I think you should try to follow the W3C
submission to see how difficult parsing a binary format is to see
whether you want to continue or not in a week or two after trying to
write a binary parser from scratch. Don't focus on the writer at this
point if you think the parser will be enough for you.

Once the RDF-HDT people release a newer version of the specification,
you can switch to using that, but it would be great to see if you can
get a basic parser up and running based on the older W3C submission.
To start off with you could try just parsing the header, and see how
difficult that turns out to be before deciding about the rest of the
time.

Sorry in advance btw, this is my first time being a GSOC mentor and I
may do things wrong.

Cheers,

Peter



On 8 June 2015 at 17:09, Junyue Wang <[email protected]> wrote:
> Hello Peter,
>
> I went through the W3C document. I think coding from scratch is too
> difficult for me. In the project proposal I submitted, Java HDT library[1]
> is to be reused for parsing and writing hdt files. The jena integration is
> built on top of Java HDT library as well. I reviewed the source code of
> Java HDT library, which does not strictly conform to the W3C document. If
> we follow the specification precisely, the new sesame-rio-rdfhdt module may
> not be able to dealing with the hdt files generated by Java HDT library.
>
> I hope it's OK to stick to the original idea in the proposal. Or we may
> have problems to complete the project within the 3-month period.
>
> [1] http://www.rdfhdt.org/manual-of-the-java-hdt-library/
> [2] http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/
>
> yours,
> junyue
>
>
> On Mon, Jun 8, 2015 at 8:48 AM, Peter Ansell <[email protected]> wrote:
>
>> Hi Junyue,
>>
>> You are not going to be using or linking to the existing RDF/HDT
>> implementations so their use of TripleString internally should not be
>> an issue for you and you do not need to look at the RDF/HDT Java
>> source code for this project.
>>
>> The sole reference for your implementation is the following document
>> that the RDF/HDT team submitted to the W3C:
>>
>> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/
>>
>> Specifically, you need to implement a binary parser from scratch based
>> on the specification given in section 3:
>>
>> http://www.w3.org/Submission/2011/SUBM-HDT-20110330/#syntax
>>
>> Cheers,
>>
>> Peter
>>
>> On 8 June 2015 at 01:43, Junyue Wang <[email protected]> wrote:
>> > Hello Peter,
>> >
>> > I've done with creating the new module and the new format. Now I'm
>> > implementing the RDFHDTParser.
>> > One question: If I search RDF HDT, it provides TripleString for each
>> > triple. TripleString contains 3 Strings for subject, predicate and object
>> > respectively. I need to transform the Strings into Sesame Values, which
>> may
>> > be URI, Resource, Literal or BlankNode. But I don't know before hand
>> which
>> > concrete types of Value they are. Is there a neat way to do this?
>> >
>> > I checked out ValueFactory in Sesame. It only does the transformation for
>> > the given concrete type.
>> >
>> > yours,
>> > junyue
>> >
>> > On Sun, May 17, 2015 at 9:09 AM, Peter Ansell <[email protected]>
>> > wrote:
>> >
>> >> Hi Junjue,
>> >>
>> >> It will be simplest to track if you fork the Marmotta repository at
>> >> GitHub and create a branch named "MARMOTTA-593".
>> >>
>> >> Add me as a collaborator to the GitHub repository. My GitHub id is
>> >> "ansell".
>> >>
>> >> The collaborators list for my fork is at:
>> >>
>> >> https://github.com/ansell/marmotta/settings/collaboration
>> >>
>> >> When you fork it, you can replace "ansell" with your GitHub id and use
>> >> that page to add me to the list of collaborators.
>> >>
>> >> Yes, the code will be merged to Marmotta in the end.
>> >>
>> >> You should create a new module inside of marmotta-sesame-tools named
>> >> "marmotta-rio-rdfht"
>> >>
>> >>
>> >>
>> https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools
>> >>
>> >> You will also need to add a format constant into marmotta-rio-api as a
>> >> new folder in the following directory, similar to the current 3
>> >> folders there:
>> >>
>> >>
>> >>
>> https://github.com/apache/marmotta/tree/master/commons/marmotta-sesame-tools/marmotta-rio-api/src/main/java/org/apache/marmotta/commons/sesame/rio
>> >>
>> >> Cheers,
>> >>
>> >> Peter
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> Peter
>> >>
>> >> On 16 May 2015 at 22:19, Junyue Wang <[email protected]> wrote:
>> >> > Hello Sergio, Peter,
>> >> >
>> >> > It's my honor to be a GSoC student. I appreciate your help for the
>> >> comments
>> >> > of the project proposal.
>> >> > I read the proposed methodology you pointed out. But it seems my
>> project
>> >> is
>> >> > only related to Sesame and RDF HDT, without touching the code base of
>> >> > Marmotta. Should I fork Marmotta in github, or start a new repository
>> >> there?
>> >> > Will my code be merged into Marmotta in the end? If so, which module
>> of
>> >> > Marmotta?
>> >> >
>> >> > yours,
>> >> > junyue
>> >> >
>> >> > On Thu, Apr 30, 2015 at 2:41 PM, Sergio Fernández <[email protected]>
>> >> wrote:
>> >> >
>> >> >> Hi Peter,
>> >> >>
>> >> >> On Wed, Apr 29, 2015 at 1:12 AM, Peter Ansell <
>> [email protected]>
>> >> >> wrote:
>> >> >>>
>> >> >>> Those guidelines look great to me, especially the suggestion about
>> the
>> >> >>> branch name including the Jira issue, which I have found very useful
>> >> >>> in all of my git-based projects. In the RDF/HDT case, and possibly
>> in
>> >> >>> the GeoSPARQL case, the contributed code could be in the form of a
>> new
>> >> >>> module, so there won't be much interference with the rest of the
>> >> >>> codebase during that time. However, it is still useful to regularly
>> >> >>> merge the "develop" branch into each of the branches to keep up to
>> >> >>> date and reduce the number of merge conflicts occurring near the end
>> >> >>> when the students will be rushing to complete the project.
>> >> >>
>> >> >>
>> >> >> Great you like it, Peter :-)
>> >> >>
>> >> >> I expect less merge conflicts, nevertheless it's a more concrete
>> >> library;
>> >> >> with the GeoSPARQL project that workflow is much more important.
>> >> >>
>> >> >> I've just have one concern about the documentation. Last year I had
>> >> >> formatting issues bringing that documentation into the wiki (MoinMoin
>> >> >> syntax is not markdown, unfortunately). Do you think is better to do
>> it
>> >> >> directly in the wiki?
>> >> >>
>> >> >> I'd love to hear comments from our students, after all you're the
>> ones
>> >> who
>> >> >> need to follow that proposed methodology.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> --
>> >> >> Sergio Fernández
>> >> >> Partner Technology Manager
>> >> >> Redlink GmbH
>> >> >> m: +43 6602747925
>> >> >> e: [email protected]
>> >> >> w: http://redlink.co
>> >> >>
>> >>
>>

Reply via email to