[ 
https://issues.apache.org/jira/browse/JENA-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901365#comment-13901365
 ] 

Damian Steer edited comment on JENA-641 at 2/14/14 12:02 PM:
-------------------------------------------------------------

Iconv agrees that there's an issue:

{noformat}
$ iconv /tmp/getty-codes.ttl
  rdfs:comment "We doniconv: illegal input sequence at position 4039
{noformat}

Character 222 -- windows apostrophe.

You can fix it with:

{noformat}
$ iconv -f CP1252 -t utf8 /tmp/getty-codes.ttl > getty-codes-fixed.ttl
{noformat}

It would be nice if we could issue a warning, but move on in this situation. 
I'm not sure that's possible: 
http://stackoverflow.com/questions/7280956/how-to-skip-invalid-characters-in-stream-in-java-scala


was (Author: shellac):
Iconv agrees that there's an issue:

{noformat}
$ iconv /tmp/getty-codes.ttl
  rdfs:comment "We doniconv: illegal input sequence at position 4039
{noformat}

Character 222 -- windows apostrophe.

You can fix it with:

{noformat}
$ iconv -f CP1252 -t utf8 /tmp/getty-codes.ttl > getty-codes-fixed.ttl
{noformat}

> org.apache.jena.atlas.AtlasException on particular Turtle file
> --------------------------------------------------------------
>
>                 Key: JENA-641
>                 URL: https://issues.apache.org/jira/browse/JENA-641
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: RIOT
>    Affects Versions: Jena 2.11.1
>            Reporter: Vladimir Alexiev
>         Attachments: getty-codes.ttl
>
>
> {noformat}
> > riot --validate getty-codes.ttl
> Exception in thread "main" org.apache.jena.atlas.AtlasException: 
> java.nio.charset.MalformedInputException: Input length = 1
>         at org.apache.jena.atlas.io.IO.exception(IO.java:206)
>         at 
> org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
>         at 
> org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
>         at 
> org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
>         at 
> org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:243)
>         at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:237)
>         at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:159)
>         at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:100)
>         at 
> org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
>         at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:131)
>         at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:253)
>         at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:182)
>         at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:172)
>         at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
>         at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
>         at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
>         at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
>         at riotcmd.riot.main(riot.java:35)
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>         at java.nio.charset.CoderResult.throwException(Unknown Source)
>         at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
>         at sun.nio.cs.StreamDecoder.read(Unknown Source)
>         at java.io.InputStreamReader.read(Unknown Source)
>         at java.io.Reader.read(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to