[jira] Updated: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse

Rob Vesse (JIRA) Sat, 18 Dec 2010 05:43:31 -0800

     [ 
https://issues.apache.org/jira/browse/JENA-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rob Vesse updated JENA-12:
--------------------------

    Attachment: ttl-with-bom.ttl

Sample Turtle file with a UTF-8 BOM which fails to parse

> Turtle Files with a UTF-8 BOM fail to parse
> -------------------------------------------
>
>                 Key: JENA-12
>                 URL: https://issues.apache.org/jira/browse/JENA-12
>             Project: Jena
>          Issue Type: Bug
>          Components: RIOT
>         Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4
>            Reporter: Rob Vesse
>         Attachments: ttl-with-bom.ttl
>
>
> If a Turtle file has a BOM at the start then Jena will refuse to parse it 
> giving the following error:
> Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: 
> Lexical error at line 1, column 2.  Encountered: "@" (64), after : "\ufeff"
>     at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
>     at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
>     at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
>     at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
>     at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
>     at TurtleWithBOM.main(TurtleWithBOM.java:31)
> The code I used to produce this error was as follows:
> import com.hp.hpl.jena.rdf.model.*;
> import com.hp.hpl.jena.util.FileManager;
> import java.io.*;
> public class TurtleWithBOM
> {
>     public static void main(String[] args)
>     {
>         // create an empty model
>         Model model = ModelFactory.createDefaultModel();
>         InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
>         if (in == null)
>             {
>             throw new IllegalArgumentException( "File: ttl-with-bom.ttl not 
> found");
>         }
>         // read the Turtle file
>         model.read(in, "", "TTL");
>         // write it to standard out
>         model.write(System.out);
>     }
> }
> A sample Turtle file used with the above code can be found attached to the 
> original report to the Jena Users mailing list here - 
> http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201012.mbox/%3CEMEW3|b0e33a3dc6849ef75f49c8891480853dmBGBgv06rav08r|ecs.soton.ac.uk|[email protected]%3e
> The data files are coming from my software which is all written in .Net and 
> when outputting in UTF-8 the default behaviour of .Net is to include the BOM 
> at the start of the file. The BOM is not required for UTF-8 but it is not 
> forbidden so I think this should be fixed (if possible) for future releases. 
> I will be modifying my software so that output of the BOM can be disabled by 
> my users if desired 
> Looking at the error message given I expect that the same problem would also 
> affect N3 files since they are using the same reader afaict from the error 
> trace. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse

Reply via email to