Hi Rob,

Thanks for the minimal, complete, example.

The parsers should cope with a UTF-8 BOM even if it's not recommended.

Could you raise a JIRA issue for this please (it's the new process!). It'll need fixing in Jena and RIOT.

        Andy

On 17/12/10 11:42, Rob Vesse wrote:
Hi all

I had this issue reported to me recently and have been able to confirm
it myself (example data file attached). Essentially the issue is that if
a Turtle file has a BOM at the start then Jena will refuse to parse it
giving the following error:

Exception in thread "main"
com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1,
column 2. Encountered: "@" (64), after : "\ufeff"
at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
at TurtleWithBOM.main(TurtleWithBOM.java:31)

The code I used to produce this error was as follows:

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;

import java.io.*;

public class TurtleWithBOM
{

public static void main(String[] args)
{

// create an empty model
Model model = ModelFactory.createDefaultModel();

InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
if (in == null)
{
throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found");
}

// read the Turtle file
model.read(in, "", "TTL");

// write it to standard out
model.write(System.out);
}
}

A sample data file used with the above code to reproduce the error is
attached.

The data files are coming from my software which is all written in .Net
and when outputting in UTF-8 the default behaviour of .Net is to include
the BOM at the start of the file. The BOM is not required for UTF-8 but
it is not forbidden so I think this should be fixed (if possible) for
future releases. I will be modifying my software so that output of the
BOM can be disabled by my users if desired

Looking at the error message given I expect that the same problem would
also affect N3 files since they are using the same reader afaict from
the error trace.

Regards,

Rob Vesse

--
PhD Student
IAM Group
Bay 20, Room 4027, Building 32
Electronics&  Computer Science
University of Southampton

Reply via email to