Hi Andy I've created a JIRA issue for this - https://issues.apache.org/jira/browse/JENA-12
I appreciate the need for minimal, complete examples as I have enough trouble getting those out of users on my own support lists Thanks, Rob On Fri, 17 Dec 2010 14:10:09 +0000, Andy Seaborne <[email protected]> wrote: > Hi Rob, > > Thanks for the minimal, complete, example. > > The parsers should cope with a UTF-8 BOM even if it's not recommended. > > Could you raise a JIRA issue for this please (it's the new process!). > It'll need fixing in Jena and RIOT. > > Andy > > On 17/12/10 11:42, Rob Vesse wrote: >> Hi all >> >> I had this issue reported to me recently and have been able to confirm >> it myself (example data file attached). Essentially the issue is that if >> a Turtle file has a BOM at the start then Jena will refuse to parse it >> giving the following error: >> >> Exception in thread "main" >> com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1, >> column 2. Encountered: "@" (64), after : "\ufeff" >> at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44) >> at >> com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21) >> at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101) >> at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68) >> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226) >> at TurtleWithBOM.main(TurtleWithBOM.java:31) >> >> The code I used to produce this error was as follows: >> >> import com.hp.hpl.jena.rdf.model.*; >> import com.hp.hpl.jena.util.FileManager; >> >> import java.io.*; >> >> public class TurtleWithBOM >> { >> >> public static void main(String[] args) >> { >> >> // create an empty model >> Model model = ModelFactory.createDefaultModel(); >> >> InputStream in = FileManager.get().open( "ttl-with-bom.ttl" ); >> if (in == null) >> { >> throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found"); >> } >> >> // read the Turtle file >> model.read(in, "", "TTL"); >> >> // write it to standard out >> model.write(System.out); >> } >> } >> >> A sample data file used with the above code to reproduce the error is >> attached. >> >> The data files are coming from my software which is all written in .Net >> and when outputting in UTF-8 the default behaviour of .Net is to include >> the BOM at the start of the file. The BOM is not required for UTF-8 but >> it is not forbidden so I think this should be fixed (if possible) for >> future releases. I will be modifying my software so that output of the >> BOM can be disabled by my users if desired >> >> Looking at the error message given I expect that the same problem would >> also affect N3 files since they are using the same reader afaict from >> the error trace. >> >> Regards, >> >> Rob Vesse >> >> -- >> PhD Student >> IAM Group >> Bay 20, Room 4027, Building 32 >> Electronics& Computer Science >> University of Southampton >> -- PhD Student IAM Group Bay 20, Room 4027, Building 32 Electronics & Computer Science University of Southampton
