[ 
https://issues.apache.org/jira/browse/JENA-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134184#comment-17134184
 ] 

Jonas Sourlier commented on JENA-1909:
--------------------------------------

It was Apache Jena 3.15.0.

After the import, the files in the /data directory amounted to 1.77 TB (for a 
539 GB input file). So, the output files are considerably larger (due to index 
files etc.), and the total disk space at the end of the import was about 2.3 TB.

However, during my first run with the old tdbloader2, I noticed that at some 
point during the import, 600 GB of temporary storage were used (in the 
directory specified by the TMPDIR system variable). This amount might have been 
even higher, it was at 600 GB when I checked it at some random point in time. 
Not sure about tdb2.tdbloader, but it might well be possible that you need more 
disk space than those 2.3 GB.

> TDB1: tdbloader2 crashes
> ------------------------
>
>                 Key: JENA-1909
>                 URL: https://issues.apache.org/jira/browse/JENA-1909
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 3.15.0
>            Reporter: Jonas Sourlier
>            Priority: Major
>         Attachments: signature.asc, signature.asc, tdb2.log
>
>
> This might be related to JENA-1908, but since the stack trace is different, I 
> opened a second ticket.
> Tried to import the latest Wikidata dump into Apache Jena, using the 
> following setup:
>  * Ubuntu 20.04 on Windows 10 Subsystem for Linux
>  * Apache Jena 3.15.0
>  * Intel i7 4770K, 32GB RAM
>  * 
> {code:java}
> openjdk 11.0.7 2020-04-14
> OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
> OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, 
> sharing){code}
> These are the commands I have run:
> {code:java}
> wget -c 
> http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
> tar -xvzf apache-jena-3.15.0.tar.gz
> mkdir data
> apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl 
> > tdb1.log 2> tdb2.log &
> apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/  > tdb1.log 2> 
> tdb2.log &
> {code}
> The data phase ran fine, but the index phase crashed after about 10 hours. 
> The stack trace is attached to this ticket (tdb2.log).
> Here's the standard output:
> {code:java}
>  08:47:57 INFO -- TDB Bulk Loader Start
>  08:47:57 INFO Index Building Phase
>  08:47:57 INFO Creating Index SPO
>  08:47:58 INFO Sort SPO
>  18:26:19 INFO Sort SPO Completed
>  18:26:19 INFO Build SPO
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to