[jira] Updated: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse

Rob Vesse (JIRA) Sat, 18 Dec 2010 05:43:46 -0800

     [ 
https://issues.apache.org/jira/browse/JENA-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rob Vesse updated JENA-12:
--------------------------

    Description: 
If a Turtle file has a BOM at the start then Jena will refuse to parse it 
giving the following error:

Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: 
Lexical error at line 1, column 2.  Encountered: "@" (64), after : "\ufeff"
    at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
    at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
    at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
    at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
    at TurtleWithBOM.main(TurtleWithBOM.java:31)

The code I used to produce this error was as follows:

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;

import java.io.*;

public class TurtleWithBOM
{

    public static void main(String[] args)
    {

        // create an empty model
        Model model = ModelFactory.createDefaultModel();

        InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
        if (in == null)
            {
            throw new IllegalArgumentException( "File: ttl-with-bom.ttl not 
found");
        }

        // read the Turtle file
        model.read(in, "", "TTL");

        // write it to standard out
        model.write(System.out);
    }
}

A sample Turtle file used with the above code is attached to this issue.

The data files are coming from my software which is all written in .Net and 
when outputting in UTF-8 the default behaviour of .Net is to include the BOM at 
the start of the file. The BOM is not required for UTF-8 but it is not 
forbidden so I think this should be fixed (if possible) for future releases. I 
will be modifying my software so that output of the BOM can be disabled by my 
users if desired 

Looking at the error message given I expect that the same problem would also 
affect N3 files since they are using the same reader afaict from the error 
trace. 


  was:
If a Turtle file has a BOM at the start then Jena will refuse to parse it 
giving the following error:

Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: 
Lexical error at line 1, column 2.  Encountered: "@" (64), after : "\ufeff"
    at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
    at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
    at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
    at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
    at TurtleWithBOM.main(TurtleWithBOM.java:31)

The code I used to produce this error was as follows:

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;

import java.io.*;

public class TurtleWithBOM
{

    public static void main(String[] args)
    {

        // create an empty model
        Model model = ModelFactory.createDefaultModel();

        InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
        if (in == null)
            {
            throw new IllegalArgumentException( "File: ttl-with-bom.ttl not 
found");
        }

        // read the Turtle file
        model.read(in, "", "TTL");

        // write it to standard out
        model.write(System.out);
    }
}

A sample Turtle file used with the above code can be found attached to the 
original report to the Jena Users mailing list here - 
http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201012.mbox/%3CEMEW3|b0e33a3dc6849ef75f49c8891480853dmBGBgv06rav08r|ecs.soton.ac.uk|[email protected]%3e

The data files are coming from my software which is all written in .Net and 
when outputting in UTF-8 the default behaviour of .Net is to include the BOM at 
the start of the file. The BOM is not required for UTF-8 but it is not 
forbidden so I think this should be fixed (if possible) for future releases. I 
will be modifying my software so that output of the BOM can be disabled by my 
users if desired 

Looking at the error message given I expect that the same problem would also 
affect N3 files since they are using the same reader afaict from the error 
trace. 



> Turtle Files with a UTF-8 BOM fail to parse
> -------------------------------------------
>
>                 Key: JENA-12
>                 URL: https://issues.apache.org/jira/browse/JENA-12
>             Project: Jena
>          Issue Type: Bug
>          Components: RIOT
>         Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4
>            Reporter: Rob Vesse
>         Attachments: ttl-with-bom.ttl
>
>
> If a Turtle file has a BOM at the start then Jena will refuse to parse it 
> giving the following error:
> Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: 
> Lexical error at line 1, column 2.  Encountered: "@" (64), after : "\ufeff"
>     at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
>     at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
>     at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
>     at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
>     at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
>     at TurtleWithBOM.main(TurtleWithBOM.java:31)
> The code I used to produce this error was as follows:
> import com.hp.hpl.jena.rdf.model.*;
> import com.hp.hpl.jena.util.FileManager;
> import java.io.*;
> public class TurtleWithBOM
> {
>     public static void main(String[] args)
>     {
>         // create an empty model
>         Model model = ModelFactory.createDefaultModel();
>         InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
>         if (in == null)
>             {
>             throw new IllegalArgumentException( "File: ttl-with-bom.ttl not 
> found");
>         }
>         // read the Turtle file
>         model.read(in, "", "TTL");
>         // write it to standard out
>         model.write(System.out);
>     }
> }
> A sample Turtle file used with the above code is attached to this issue.
> The data files are coming from my software which is all written in .Net and 
> when outputting in UTF-8 the default behaviour of .Net is to include the BOM 
> at the start of the file. The BOM is not required for UTF-8 but it is not 
> forbidden so I think this should be fixed (if possible) for future releases. 
> I will be modifying my software so that output of the BOM can be disabled by 
> my users if desired 
> Looking at the error message given I expect that the same problem would also 
> affect N3 files since they are using the same reader afaict from the error 
> trace. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse

Reply via email to