Alexander Bigerl created JENA-2044:
--------------------------------------
Summary: tdb2.tdbloader crashses loading wikidata
Key: JENA-2044
URL: https://issues.apache.org/jira/browse/JENA-2044
Project: Apache Jena
Issue Type: Bug
Components: Cmd line tools, Jena, TDB2
Affects Versions: Jena 3.17.0, Jena 3.14.0
Environment: {code:bash}
$ java --version
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode,
sharing)
$ uname -r
4.19.0-14-amd64
$ lsb_release -da
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
{code}
Reporter: Alexander Bigerl
Attachments: hs_err_pid28709.log
Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not
available any more via wikidata, but a backup can be found here:
[https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
command run was:
{code:bash}
cgmemtime
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
--loader=sequential --loc
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
2>&1 | tee
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
{code}
The end of the logfile is:
{code:bash}
06:56:52 INFO loader :: Add: 729,000,000 SPO->OSP (Batch: 237,642 /
Avg: 300,050)
06:56:57 INFO loader :: Add: 730,000,000 SPO->OSP (Batch: 234,962 /
Avg: 299,936)
06:56:57 INFO loader :: Elapsed: 2,433.85 seconds [2021/02/12
06:56:57 CET]
06:57:01 INFO loader :: Add: 731,000,000 SPO->OSP (Batch: 233,863 /
Avg: 299,820)
06:57:05 INFO loader :: Add: 732,000,000 SPO->OSP (Batch: 269,978 /
Avg: 299,775)
06:57:08 INFO loader :: Add: 733,000,000 SPO->OSP (Batch: 281,373 /
Avg: 299,748)
06:57:12 INFO loader :: Add: 734,000,000 SPO->OSP (Batch: 285,143 /
Avg: 299,727)
06:57:15 INFO loader :: Add: 735,000,000 SPO->OSP (Batch: 290,023 /
Avg: 299,714)
06:57:19 INFO loader :: Add: 736,000,000 SPO->OSP (Batch: 290,951 /
Avg: 299,701)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2097152 bytes for
AllocateHeap
# An error report file with more information is saved as:
# /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
Child user: 70961.063 s
Child sys : 5817.787 s
Child wall: 76243.666 s
Child high-water RSS : 534037652 KiB
Recursive and acc. high-water RSS+CACHE : 585081976 KiB
{code}
The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd
storage on /home. So there should still be plenty of RAM have been available.
The loc folder has at the time of crash 543GB.
I also tried with -loader=parallel and -loader=phased . Same result.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)