Hi Andy

Andy Seaborne wrote:
> On 21/03/12 09:38, Paolo Castagna wrote:
>> Hi,
>> I am sorry if this is a silly question, but I need some clarity (or
>> another coffee already).
> 
> Have you had that coffee yet?

:-)

>>
>> The following are Java strings, therefore \n is the new line character...
>>
>> Java strings                 Turtle literals   N-Triples literals
>> -----------------------------------------------------------------
>> \"\"\"Hello \n World\"\"\"   legal             illegal
> 
> Yes - a triple quoted string can contain a raw newline.
> 
>> \"Hello \n World\"           illegal           illegal
>> \"\"\"Hello \\n World\"\"\"  legal             legal
>> \"Hello \\n World\"          legal             legal
>> \"Hello \u0010 World\"       legal             legal
>> -----------------------------------------------------------------
> 
> Yes - it's layering.
> 
> Don't forget about using ' for " (for this exact reason).

Yep. Thanks.

>> If someone tries to parse a Turtle | N-Triples file with a literal
>> with the characters '\''n' in it, we have a RiotException:
>>
>> org.openjena.riot.RiotException: [line: 1, col: 68] Broken token
>> (newline): Hello
>>     at
>> org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:125)
>>
>>     at
>> org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
>>     at org.openjena.riot.lang.LangEngine.nextToken(LangEngine.java:116)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:307)
>>
>>     at
>> org.openjena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:289)
>>
>>     at
>> org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:280)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:219)
>>
>>     at
>> org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:144)
>>     at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
>>     at org.openjena.riot.RiotLoader.datasetFromString(RiotLoader.java:79)
>>     at dev.Run2.main(Run2.java:47)
>>
>> Example here:
>> https://raw.github.com/castagna/jena-examples/master/src/main/java/dev/Run2.java
>>
>>
>> I think this is the right behavior, since the new line character is
>> not legal in a string literal in N-Triples | N-Quads files.
>> It must be escaped '\n' (in a Java string as "\\n" or \u0010).
>>
>> Right?
> 
> Looks right to me.

Ok, thanks for the sanity check.

Investigation continues...

We have some data which is coming in as N-Triples and/or Turtle and there must 
be
something weird with it. Data goes between different "systems" and, as usual, 
people
use all sort of tools to generate the data.
Something must be wrong with the data, but it is passing our checks and causing
problems further on (when we assume we have legal Turtle or N-Triples in our 
hands).
So, I am trying to understand if there is a problem somewhere... or, simply, 
the data
is illegal.

Sam contributed this test case:
https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
Looking at this, right now.

Cheers,
Paolo

> 
>>
>> Paolo
> 

Reply via email to