[ 
https://issues.apache.org/jira/browse/JENA-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonas Sourlier updated JENA-1909:
---------------------------------
    Description: 
This might be related to JENA-1908, but since the stack trace is different, I 
opened a second ticket.

Tried to import the latest Wikidata dump into Apache Jena, using the following 
setup:
 * Ubuntu 20.04 on Windows 10 Subsystem for Linux
 * Apache Jena 3.15.0
 * Intel i7 4770K, 32GB RAM
 * 
{code:java}
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, 
sharing){code}

These are the commands I have run:
{code:java}
wget -c http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
tar -xvzf apache-jena-3.15.0.tar.gz
mkdir data
apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl > 
tdb1.log 2> tdb2.log &
apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/  > tdb1.log 2> 
tdb2.log &

{code}
The data phase ran fine, but the index phase crashed after about 10 hours. This 
is the stack trace which appears in the error output:
{code:java}
{code}
Here's the standard output:
{code:java}
 08:47:57 INFO -- TDB Bulk Loader Start
 08:47:57 INFO Index Building Phase
 08:47:57 INFO Creating Index SPO
 08:47:58 INFO Sort SPO
 18:26:19 INFO Sort SPO Completed
 18:26:19 INFO Build SPO
{code}
 

  was:
This might be related to JENA-1908, but since the stack trace is different, I 
opened a second ticket.

Tried to import the latest Wikidata dump into Apache Jena, using the following 
setup:
 * Ubuntu 20.04 on Windows 10 Subsystem for Linux
 * Apache Jena 3.15.0
 * Intel i7 4770K, 32GB RAM
 * 
{code:java}
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, 
sharing){code}

These are the commands I have run:
{code:java}
wget -c http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
tar -xvzf apache-jena-3.15.0.tar.gz
mkdir data
apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl > 
tdb1.log 2> tdb2.log &
apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/  > tdb1.log 2> 
tdb2.log &

{code}
The data phase ran fine, but the index phase crashed after about 10 hours. This 
is the stack trace which appears in the error output:
{code:java}
Exception in thread "main" java.lang.IllegalArgumentException: Bad index char : 
112Exception in thread "main" java.lang.IllegalArgumentException: Bad index 
char : 112 at org.apache.jena.atlas.lib.Hex.hexByteToInt(Hex.java:76) at 
org.apache.jena.atlas.lib.Hex.getLong(Hex.java:61) at 
org.apache.jena.tdb.store.bulkloader2.RecordsFromInput.hasNext(RecordsFromInput.java:85)
 at 
org.apache.jena.tdb.index.bplustree.RecordBufferPagePacker.hasNext(RecordBufferPagePacker.java:69)
 at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50) at 
org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92) at 
org.apache.jena.tdb.index.bplustree.RecordBufferPageLinker.hasNext(RecordBufferPageLinker.java:64)
 at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347) at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.next(IteratorWithBuffer.java:69)
 at 
org.apache.jena.tdb.index.bplustree.BPTreeNodeBuilder.hasNext(BPTreeNodeBuilder.java:92)
 at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.next(IteratorWithBuffer.java:69)
 at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.next(IteratorWithBuffer.java:76)
 at 
org.apache.jena.tdb.index.bplustree.BPTreeNodeBuilder.hasNext(BPTreeNodeBuilder.java:92)
 at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.next(IteratorWithBuffer.java:69)
 at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.next(IteratorWithBuffer.java:76)
 at 
org.apache.jena.tdb.index.bplustree.BPTreeNodeBuilder.hasNext(BPTreeNodeBuilder.java:92)
 at 
org.apache.jena.atlas.iterator.IteratorWithBuffer.<init>(IteratorWithBuffer.java:49)
 at 
org.apache.jena.tdb.index.bplustree.BPlusTreeRewriter$RebalenceBase.<init>(BPlusTreeRewriter.java:276)
 at 
org.apache.jena.tdb.index.bplustree.BPlusTreeRewriter$RebalenceIndexEnd.<init>(BPlusTreeRewriter.java:309)
 at 
org.apache.jena.tdb.index.bplustree.BPlusTreeRewriter.genTreeLevel(BPlusTreeRewriter.java:258)
 at 
org.apache.jena.tdb.index.bplustree.BPlusTreeRewriter.packIntoBPlusTree(BPlusTreeRewriter.java:102)
 at 
org.apache.jena.tdb.store.bulkloader2.ProcIndexBuild.exec(ProcIndexBuild.java:101)
 at tdb.bulkloader2.CmdIndexBuild.main(CmdIndexBuild.java:50) 18:31:37 ERROR 
Failed during index phase
{code}
Here's the standard output:
{code:java}
 08:47:57 INFO -- TDB Bulk Loader Start
 08:47:57 INFO Index Building Phase
 08:47:57 INFO Creating Index SPO
 08:47:58 INFO Sort SPO
 18:26:19 INFO Sort SPO Completed
 18:26:19 INFO Build SPO
{code}
 


> tdb2.tdbloader crashes
> ----------------------
>
>                 Key: JENA-1909
>                 URL: https://issues.apache.org/jira/browse/JENA-1909
>             Project: Apache Jena
>          Issue Type: Bug
>    Affects Versions: Jena 3.15.0
>            Reporter: Jonas Sourlier
>            Priority: Major
>
> This might be related to JENA-1908, but since the stack trace is different, I 
> opened a second ticket.
> Tried to import the latest Wikidata dump into Apache Jena, using the 
> following setup:
>  * Ubuntu 20.04 on Windows 10 Subsystem for Linux
>  * Apache Jena 3.15.0
>  * Intel i7 4770K, 32GB RAM
>  * 
> {code:java}
> openjdk 11.0.7 2020-04-14
> OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
> OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, 
> sharing){code}
> These are the commands I have run:
> {code:java}
> wget -c 
> http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
> tar -xvzf apache-jena-3.15.0.tar.gz
> mkdir data
> apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl 
> > tdb1.log 2> tdb2.log &
> apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/  > tdb1.log 2> 
> tdb2.log &
> {code}
> The data phase ran fine, but the index phase crashed after about 10 hours. 
> This is the stack trace which appears in the error output:
> {code:java}
> {code}
> Here's the standard output:
> {code:java}
>  08:47:57 INFO -- TDB Bulk Loader Start
>  08:47:57 INFO Index Building Phase
>  08:47:57 INFO Creating Index SPO
>  08:47:58 INFO Sort SPO
>  18:26:19 INFO Sort SPO Completed
>  18:26:19 INFO Build SPO
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to