[
https://issues.apache.org/jira/browse/JENA-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903119#comment-13903119
]
Andy Seaborne commented on JENA-641:
------------------------------------
Damian's idea looks viable for parsing and the best way to take some degree of
control.
This isn't the only place this happens. The SPARQL parser is susceptible as
well where there the bytes->chars is controlled by javacc, not directly by
Jena. Arguably, its more serious on the data side (bigger, produced by someone
else).
Even if the UTF_8.Decoder developers (that would be the openjdk team and
similarly for other VMs; it's part of the std runtime) make a change, it then
needs to be known about and roll through parts of the java ecosystem. (yes - we
could replace that part of javacc as well - javacc is modular.)
> org.apache.jena.atlas.AtlasException on particular Turtle file
> --------------------------------------------------------------
>
> Key: JENA-641
> URL: https://issues.apache.org/jira/browse/JENA-641
> Project: Apache Jena
> Issue Type: Bug
> Components: RIOT
> Affects Versions: Jena 2.11.1
> Reporter: Vladimir Alexiev
> Priority: Minor
> Attachments: getty-codes.ttl
>
>
> {noformat}
> > riot --validate getty-codes.ttl
> Exception in thread "main" org.apache.jena.atlas.AtlasException:
> java.nio.charset.MalformedInputException: Input length = 1
> at org.apache.jena.atlas.io.IO.exception(IO.java:206)
> at
> org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
> at
> org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
> at
> org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
> at
> org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:243)
> at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:237)
> at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:159)
> at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:100)
> at
> org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
> at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:131)
> at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:253)
> at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:182)
> at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:172)
> at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
> at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
> at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
> at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
> at riotcmd.riot.main(riot.java:35)
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
> at java.nio.charset.CoderResult.throwException(Unknown Source)
> at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
> at sun.nio.cs.StreamDecoder.read(Unknown Source)
> at java.io.InputStreamReader.read(Unknown Source)
> at java.io.Reader.read(Unknown Source)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)