[
https://issues.apache.org/jira/browse/JENA-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134156#comment-17134156
]
Jonas Sourlier commented on JENA-1909:
--------------------------------------
Hi Wolfgang
Of course, here we go:
* Precise source: It's a Wikidata latest-all.ttl which was downloaded on 28
April 2020.
* The importer wrote "INFO loader :: Time = 577,390.218 seconds : Triples =
12,910,421,722 : Rate = 22,360 /s" at the end of the import. So, 12'910'421'722
triples in total.
* Intel i7-4770K CPU
* 32GB DDR3 1600MHz RAM
* One Samsung 860 Evo Basic 4TB SSD. Both the ttl source file and the data
output file were on this disk.
* OS: Ubuntu 20.04 on Windows 10 Subsystem for Linux
* Java: openjdk 11.0.7 2020-04-14
Not sure if this is important, but my first attempt was on an Ubuntu VM in
VirtualBox. The VM ran on the same SSD, and I had assigned it 24GB RAM, but the
process was way slower than when I ran it "natively" in the Windows Subsystem
for Linux (which is not a virtualization).
So, if anyone experiences a slow load inside a VM, maybe try to run it natively.
Best regards
Jonas
> TDB1: tdbloader2 crashes
> ------------------------
>
> Key: JENA-1909
> URL: https://issues.apache.org/jira/browse/JENA-1909
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB
> Affects Versions: Jena 3.15.0
> Reporter: Jonas Sourlier
> Priority: Major
> Attachments: signature.asc, signature.asc, tdb2.log
>
>
> This might be related to JENA-1908, but since the stack trace is different, I
> opened a second ticket.
> Tried to import the latest Wikidata dump into Apache Jena, using the
> following setup:
> * Ubuntu 20.04 on Windows 10 Subsystem for Linux
> * Apache Jena 3.15.0
> * Intel i7 4770K, 32GB RAM
> *
> {code:java}
> openjdk 11.0.7 2020-04-14
> OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
> OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode,
> sharing){code}
> These are the commands I have run:
> {code:java}
> wget -c
> http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
> tar -xvzf apache-jena-3.15.0.tar.gz
> mkdir data
> apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl
> > tdb1.log 2> tdb2.log &
> apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/ > tdb1.log 2>
> tdb2.log &
> {code}
> The data phase ran fine, but the index phase crashed after about 10 hours.
> The stack trace is attached to this ticket (tdb2.log).
> Here's the standard output:
> {code:java}
> 08:47:57 INFO -- TDB Bulk Loader Start
> 08:47:57 INFO Index Building Phase
> 08:47:57 INFO Creating Index SPO
> 08:47:58 INFO Sort SPO
> 18:26:19 INFO Sort SPO Completed
> 18:26:19 INFO Build SPO
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)