Hello Peter,

The problem is, there are no up-to-date, complete and detailed
specifications of RDF HDT. The W3C submission [1] in 2011 is out of date.
The documentation [2] is new ,but it contains just general description
without much details. For example, the first few bytes of a RDF HDT file
are "Global ControlInformation", but neither of the above 2 docs mention
the details. For the "Global ControlInformation", the format information
should be "<http://purl.org/HDT/hdt#HDTv1>", but there's no such
information in either of the docs.

I've tried to ask for the up-to-date specification from the authors of RDF
HDT. I've also inquired the licence issue in @legal-discuss. But none
useful reply comes out until now.

In order to code the parser from scratch, I had to study the source code of
HDT Java implementation (LGPL Licence), or more explicitly, HDTImpl.java
[3]. Then I re-writed the code in my own way with the same functionality.
For example, ControlInformation in HDT Java implementation is coded in
Object-Oriented way, but I made it just using some functions/methods, with
much of the idea inspired from BinaryRDFParser [4] in Sesame (BSD
License?). However I borrowed some code of low-level byte processing from
HDT Java implementation. Is this way OK with the licence issue?

yours,
Junyue

[1] http://www.w3.org/Submission/2011/SUBM-HDT-20110330/
[2] http://www.rdfhdt.org/hdt-internals/
[3]
https://github.com/rdfhdt/hdt-java/blob/master/hdt-java-core/src/main/java/org/rdfhdt/hdt/hdt/impl/HDTImpl.java
[4]
http://grepcode.com/file/repo1.maven.org/maven2/org.openrdf.sesame/sesame-rio-binary/2.7.14/org/openrdf/rio/binary/BinaryRDFParser.java/



On Sun, Jun 28, 2015 at 3:34 PM, Peter Ansell <[email protected]>
wrote:

> Hi Junyue,
>
> Thanks for the update. See some comments inline below.
>
> On 28 June 2015 at 00:17, Junyue Wang <[email protected]> wrote:
> > Hi Peter, Sergio,
> >
> > I'm here to summarize the status for the first-half part of the GSoC
> > project:
> >
> > 1. Test data preparation
> > It's useful to have test data of hdt files prepared for testing the new
> > parser. But the dataset from [1] are too big for small tests. So I
> borrowed
> > some examples from W3C RDF documentation [2]. I used HDT java
> implementation
> > to transform example02.rdf~20.rdf into test02.hdt~20.hdt in the code base
> > [3]
>
> Having small tight examples is vital for unit testing, so that sounds
> good to me, as long as the current spec is backwards compatible with
> it.
>
> > 2. HDT RDF parser based on HDT java implementation
> > I'm sorry that the project goal was misunderstood during the project
> > proposal period. In the first few weeks of the project, I was devoted to
> > code the HDT RDF parser based on HDT java implementation. I also sent
> email
> > to legal-discuss@, for clarifying the licence issue, but no response
> showed
> > up until now. Anyway, I committed the code [4], in case it may be useful
> in
> > future.
>
> We can always rebase that commit out when contributing the final patch
> back, if it is an issue.
>
> > 3. HDT RDF parser from scratch
> > I've began to code the HDT RDF parser from scratch. Now the new parser
> can
> > parse the Global Information of the hdt files [5]. I'll continue in this
> way
> > for the next half-part of the project.
>
> That looks like a good start. See how you go after that parsing the
> other two sections and do let us know if you have any issues or
> queries.
>
> Thanks,
>
> Peter
>
> > yours,
> > Junyue
> >
> > [1] http://www.rdfhdt.org/datasets/
> > [2] https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-xml/index.html
> > [3]
> >
> https://github.com/junyuew/marmotta/tree/MARMOTTA-593/commons/marmotta-sesame-tools/marmotta-rio-rdfhdt/src/test/resources/org/apache/marmotta/commons/sesame/rio/rdfhdt
> > [4]
> >
> https://github.com/junyuew/marmotta/commit/e4b5d7492f102711c1227f592a36e26353f33812
> > [5]
> >
> https://github.com/junyuew/marmotta/commit/a7711b8338aafda9d812f0f2bb98cbde53a7cefa
> >
> >
>

Reply via email to