Hi Jona,

I have tried loading labels_en_uris_de.nt.bz2 from the DBpedia 3.8 release
using both Jena 2.7.4 and 2.10.0, but both fail with the following error:

andread@build04:~/tools/apache-jena-2.10.0/bin$ ./tdbloader2 --loc .
/media/HD2/data/dbpedia-3.8-archive/source_data/labels_en_uris_de.nt.bz2
 19:48:02 -- TDB Bulk Loader Start
 19:48:02 Data phase
INFO  Load:
/media/HD2/data/dbpedia-3.8-archive/source_data/labels_en_uris_de.nt.bz2 --
2013/03/20 19:48:03 CET
Exception in thread "main" org.apache.jena.atlas.AtlasException:
java.nio.charset.MalformedInputException: Input length = 1
    at org.apache.jena.atlas.io.IO.exception(IO.java:154)
    at
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:79)
    at
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:156)
    at
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:139)
    at
org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:251)
    at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:244)
    at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:169)
    at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:108)
    at
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
    at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:130)
    at org.apache.jena.riot.RiotReader.parse(RiotReader.java:115)
    at org.apache.jena.riot.RiotReader.parse(RiotReader.java:93)
    at org.apache.jena.riot.RiotReader.parse(RiotReader.java:66)
    at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:162)
    at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
    at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
    at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
    at
com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.Reader.read(Reader.java:140)
    ... 17 more

Anyway, I have now tried the following:

1) Download german labels
2) Run tdbloader2 on the bz2 nt file -> failure
3) Uncompress the bz2 file and run tdbloader2 -> SUCCESS
4) Compress the nt file again -> failure

Looks like Jena is having some problems with bz2 files then.
Would you mind giving it a try?

But anyway please check this JIRA issue out
https://issues.apache.org/jira/browse/STANBOL-804

Cheers
Andrea

2013/3/20 Jona Christopher Sahnwaldt <j...@sahnwaldt.de>

> Hi Andrea,
>
> there used to be encoding problems, but I think they are all fixed
> since the 3.8 release. I tried very hard to make TurtleEscaper do the
> right thing - I checked the relevant standards etc. Could you give an
> example where Jena complains about a DBpedia 3.8 file?
>
> Cheers,
> JC
>
> On Wed, Mar 20, 2013 at 6:16 PM, Andrea Di Menna <ninn...@gmail.com>
> wrote:
> > Hi,
> >
> > I have been using Stanbol [1] to process DBpedia data files and build a
> > dbpedia Solr index.
> > Stanbol is using Jena TDB in order to load DBpedia files into a triple
> > store.
> > Unfortunately, almost all the DBpedia N-Triples files must be
> pre-processed
> > before being able to import them using Jena [2].
> >
> > The following sed command is launched:
> >
> > sed 's/\\\\/\\u005c\\u005c/g;s/\\\([^u"]\)/\\u005c\1/g'
> >
> > Basically the backslash is replaced with the unicode character escape
> > sequence.
> >
> > Do you think this should/could be fixed in
> > org.dbpedia.extraction.util.TurtleEscaper#escapeTurtle ?
> >
> > Cheers
> > Andrea
> >
> > [1] http://stanbol.apache.org/
> > [2]
> >
> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh
> >
> >
> ------------------------------------------------------------------------------
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > http://p.sf.net/sfu/appdyn_d2d_mar
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > Dbpedia-discussion@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >
>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to