Nick Lothian created JENA-806:
---------------------------------

             Summary: illegal escape sequence value exception on legal 
characters
                 Key: JENA-806
                 URL: https://issues.apache.org/jira/browse/JENA-806
             Project: Apache Jena
          Issue Type: Bug
          Components: Cmd line tools
    Affects Versions: Jena 2.12.1
         Environment: Ubuntu 14.04, Java 8
            Reporter: Nick Lothian


When loading the Wikidata data dump using tdbloader2, I received the following 
error:

{{ERROR [line: 142128, col: 121] illegal escape sequence value: " (0x22)
org.apache.jena.riot.RiotException: [line: 142128, col: 121] illegal escape 
sequence value: " (0x22)
        at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
        at 
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
        at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
        at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
        at 
org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
        at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
        at org.apache.jena.riot.RiotReader.parse(RiotReader.java:119)
        at org.apache.jena.riot.RiotReader.parse(RiotReader.java:96)
        at org.apache.jena.riot.RiotReader.parse(RiotReader.java:69)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:162)
        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
        at 
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)

}}

Looking that that line 

{{sed '142128!d' uncompressed/wikidata-simple-statements.nt}}

{{<http://www.wikidata.org/entity/Q16873> <http://www.wikidata.org/entity/P18c> 
<http://commons.wikimedia.org/wiki/File:\"Retrat_de_l'escriptor_Juan_Carlos_Onetti_(1909-1994)\".png>
 .}}

Column 121 is the "R" after the ". 

Looking at http://www.w3.org/TR/n-triples/#n-triples-grammar, it appears that 
the " character is allowed.

Should tdbloader2 load this or am I missing something?









--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to