[jira] [Issue Comment Deleted] (JENA-959) riot: gzip output option

Stian Soiland-Reyes (JIRA) Mon, 08 Jun 2015 06:59:45 -0700

     [ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stian Soiland-Reyes updated JENA-959:
-------------------------------------
    Comment: was deleted

(was: Yeah, either should work. It might be worth also having explicit 
compression support for input formats.. FOr instance now it works with:

{code}
    riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz

<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> 
<http://www.w3.org/2004/02/skos/core#relatedMatch> 
<http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619> .
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> 
<http://www.w3.org/2004/02/skos/core#relatedMatch> 
<http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612> .
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> 
<http://www.w3.org/2004/02/skos/core#relatedMatch> 
<http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611> .

{code}

but it is still guessing the .gz from the filename.. so I can't do the same if 
I have piped in a gziped stream or don't have a valid extension:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred
Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
        at org.apache.jena.atlas.io.IO.exception(IO.java:222)
        at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
        at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
        at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
        at 
org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241)
        at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235)
        at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157)
        at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98)
        at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
        at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138)
        at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180)
        at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267)
        at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185)
        at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175)
        at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
        at riotcmd.riot.main(riot.java:35)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.Read
{code}


So for this I would appreciate if --syntax supported the same compression 
option:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred
Can not detemine the synatx from 'turtle.gz'
{code})

> riot: gzip output option
> ------------------------
>
>                 Key: JENA-959
>                 URL: https://issues.apache.org/jira/browse/JENA-959
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: RIOT
>            Reporter: Stian Soiland-Reyes
>            Priority: Trivial
>
> The riot command line tool supports incoming file formats like *.ttl.gz, but 
> there is no (obvious) way to also output in compressed formats.
> This can of course also be achieved with piping and gzip, but that is easily 
> platform-specific. Writing *.format.gz with the command line is probably as 
> much within remit of someone using riot on the command line as for reading 
> those.
> So my suggestion is to support extension .gz in the various --output options 
> to enabled outputting via a GzipOutputStream -- 
> http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
> For example:
> {code}
> stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
> chembl_20.0_target_targetcmpt_ls.ttl.gz 
> Not recognized as an RDF language : 'nquads.gz'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (JENA-959) riot: gzip output option

Reply via email to