Turtle Files with a UTF-8 BOM fail to parse
-------------------------------------------
Key: JENA-12
URL: https://issues.apache.org/jira/browse/JENA-12
Project: Jena
Issue Type: Bug
Components: RIOT
Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4
Reporter: Rob Vesse
If a Turtle file has a BOM at the start then Jena will refuse to parse it
giving the following error:
Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException:
Lexical error at line 1, column 2. Encountered: "@" (64), after : "\ufeff"
at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
at TurtleWithBOM.main(TurtleWithBOM.java:31)
The code I used to produce this error was as follows:
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;
import java.io.*;
public class TurtleWithBOM
{
public static void main(String[] args)
{
// create an empty model
Model model = ModelFactory.createDefaultModel();
InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
if (in == null)
{
throw new IllegalArgumentException( "File: ttl-with-bom.ttl not
found");
}
// read the Turtle file
model.read(in, "", "TTL");
// write it to standard out
model.write(System.out);
}
}
A sample Turtle file used with the above code can be found attached to the
original report to the Jena Users mailing list here -
http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201012.mbox/%3CEMEW3|b0e33a3dc6849ef75f49c8891480853dmBGBgv06rav08r|ecs.soton.ac.uk|[email protected]%3e
The data files are coming from my software which is all written in .Net and
when outputting in UTF-8 the default behaviour of .Net is to include the BOM at
the start of the file. The BOM is not required for UTF-8 but it is not
forbidden so I think this should be fixed (if possible) for future releases. I
will be modifying my software so that output of the BOM can be disabled by my
users if desired
Looking at the error message given I expect that the same problem would also
affect N3 files since they are using the same reader afaict from the error
trace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.