Re: GSoC: CSV Property Tables.

Ying Jiang Sun, 03 Aug 2014 03:42:07 -0700

Hi Andy,

Thanks for your instructions!


Last week, I made the documentation of this GSoC project (jena-csv):
- http://jena.staging.apache.org/documentation/csv/
- http://jena.staging.apache.org/documentation/csv/get_started.html
- http://jena.staging.apache.org/documentation/csv/design.html
- http://jena.staging.apache.org/documentation/csv/implementation.html

I'll make more tests and improve the documentation in the coming weeks.

Best regards,
Ying Jiang


On Tue, Jul 29, 2014 at 6:59 PM, Andy Seaborne <a...@apache.org> wrote:
> On 27/07/14 16:20, Ying Jiang wrote:
>>
>> Hi Andy,
>>
>> Thanks for your comments!
>>
>> I just submitted the code of the csv2rdf tool. It's based on
>> CmdLangParse, because parsing is the first step for transforming.
>> csv2rdf inherits all of the command line arguments (for parsing) from
>> CmdLangParse, besides the new "-dest=file" argument for the
>> destination output file. You can try out:
>> java -cp ... riotcmd.csv2rdf -dest=test.ntriples
>> src/test/resources/test.csv
>>
>> The warnings you pointed out have been fixed already. It's now clean
>> for packaging jena-csv.
>>
>> As to the release, do you mean releasing jena-csv itself, or the whole
>> jena
>
>
> jena-csv on it's own.  It means it can be released more frequently and
> out-of-step with the rest of Jena.  As we are (trying) to release only 6
> monthly for the main distribution, (!!), coupling now does not work.
>
>
>> (i.e. recent [VOTE] Release Jena 2.12.0 and Fuseki 1.1.0 )?
>> Actually, I've made some code in jena-arq (e.g. LangCSV in RIOT),
>> while more other code reside in jena-csv (e.g. PropertyTable, csv2rdf
>> tool). jena-csv depends on jena-arq. Shall I integrate jena-csv into
>> jena-arq, or just leave jena-csv a separate module alone to release?
>
>
> For now, I'd leave it separate.
>
> The changes to jena-arq should have been picked up for 2.12.0.
>
> Hopefully, you can switch to using a released Jena - the POM is using
> 2.12.0-SNAPSHOT so it'll be good for the 2.12.0 release.
>
>
>> PropertyTable and its implementations now have good test coverage.
>> However, other tests are still under development, including some unit
>> tests and the tests for the real world csv data. I'll make it more
>> sufficiently complete. In the remaining weeks, I can also compose the
>> documentation you mentioned. After that, I think it's OK to release it
>> to the world. In short, I believe I can go with the plan. Thanks a lot
>> for your help during the project!
>
>
> Great!
>
>         Andy
>
>
>>
>> Best,
>> Ying Jiang
>>
>> [1]
>> https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/riotcmd/csv2rdf.java
>>
>> On Thu, Jul 24, 2014 at 8:47 PM, Andy Seaborne <a...@apache.org> wrote:
>>>
>>> Ying,
>>>
>>> jena-csv is looking good.  Hows the csv2rdf tool coming along?
>>>
>>> Using the StageGenerator route is OK; if it were OpExecutor you could do
>>> some filtering as well but without a value based index (a general comment
>>> about Jena stores - not CSv specific) it wil no tmake a lot of
>>> performance
>>> difference.  There is material in the join engine rewrite "quack" but
>>> it's
>>> not ready.  I don't see any major thing that you've done is not
>>> applicable
>>> at some later (post-summer) date for someone interested to move it on.
>>>
>>> Looking at the time left, there is a couple weeks and then the room for
>>> manoeuvre.
>>>
>>> I think that the most important thing to get this out to people to use.
>>> do is to release it on the world!
>>>
>>> That means:
>>>
>>> 1/ Documentation for use
>>>
>>> "Getting started page"
>>> A page for full details.
>>> A page about the code (?)
>>>
>>> 2/ An Apache release.
>>>
>>> If everyone is OK with this, I suggest that this is a release in it's own
>>> right with it's own VOTE, etc etc.  It's good expereince to
>>>
>>> Take a look at our release process documentation to know what's involved:
>>>
>>> https://cwiki.apache.org/confluence/display/JENA/Release+Process
>>>
>>> and reply quite soon about whether I'vemissed anything that needs doing
>>> before a release and whether this plan works for you.  I'm very open to
>>> doing something different if you suggest something else.  The goal is to
>>> get
>>> stuff to other people - there can be various different ways to do that.
>>>
>>>          Andy
>>>
>>> Comments:
>>>
>>> I checked out a clean copy
>>>
>>> 1/ "mvn clean test" and I got a test failure.
>>>
>>> ------------
>>> Running org.apache.jena.propertytable.TS_PropertyTable
>>> log4j:ERROR Could not read configuration file from URL
>>> [file:src/test/resources/log4j.properties].
>>> java.io.FileNotFoundException: src/test/resources/log4j.properties (No
>>> such
>>> file or directory)
>>> ------------
>>> Missing file?
>>>
>>>
>>> 2/ fakign  that file I hthen got some output from the tests:
>>>
>>>> 1(?x
>>>> <file:///home/afs/Projects/jena-csv/src/test/resources/test.csv#Town>
>>>> ?townName)
>>>
>>> (?x
>>>
>>> <file:///home/afs/Projects/jena-csv/src/test/resources/test.csv#Population>
>>> ?pop)
>>> <=1 (?x
>>>
>>> <file:///home/afs/Projects/jena-csv/src/test/resources/test.csv#Predicate%20With%20Space>
>>> "PredicateWithSpace2")
>>>
>>> Are there by any chance some stray System.out.println in the code? :-)
>>>
>>> 3/ I also got some javadoc warnings when packaging.
>>>
>>> e.g.
>>> [WARNING]
>>>
>>> /home/afs/Projects/jena-csv/src/main/java/org/apache/jena/propertytable/PropertyTable.java:45:
>>> warning - @param argument "column," is not a parameter name.
>>>
>>> 4/ Is the test coverage sufficiently complete?
>
>

Re: GSoC: CSV Property Tables.

Reply via email to