Alexander Bigerl created JENA-2044:
--------------------------------------

             Summary: tdb2.tdbloader crashses loading wikidata
                 Key: JENA-2044
                 URL: https://issues.apache.org/jira/browse/JENA-2044
             Project: Apache Jena
          Issue Type: Bug
          Components: Cmd line tools, Jena, TDB2
    Affects Versions: Jena 3.17.0, Jena 3.14.0
         Environment: {code:bash}
$ java --version
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, 
sharing)

$ uname -r
4.19.0-14-amd64

$ lsb_release -da
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
{code}
            Reporter: Alexander Bigerl
         Attachments: hs_err_pid28709.log

Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not 
available any more via wikidata, but a backup can be found here: 
[https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]

command run was:
{code:bash}
cgmemtime 
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
 --loader=sequential --loc 
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
 
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
 2>&1 | tee 
/upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
{code}
 

The end of the logfile is:
{code:bash}
06:56:52 INFO  loader          :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / 
Avg: 300,050)
06:56:57 INFO  loader          :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / 
Avg: 299,936)
06:56:57 INFO  loader          ::   Elapsed: 2,433.85 seconds [2021/02/12 
06:56:57 CET]
06:57:01 INFO  loader          :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / 
Avg: 299,820)
06:57:05 INFO  loader          :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / 
Avg: 299,775)
06:57:08 INFO  loader          :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / 
Avg: 299,748)
06:57:12 INFO  loader          :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / 
Avg: 299,727)
06:57:15 INFO  loader          :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / 
Avg: 299,714)
06:57:19 INFO  loader          :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / 
Avg: 299,701)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2097152 bytes for 
AllocateHeap
# An error report file with more information is saved as:
# /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
Child user: 70961.063 s
Child sys : 5817.787 s
Child wall: 76243.666 s
Child high-water RSS                    :  534037652 KiB
Recursive and acc. high-water RSS+CACHE :  585081976 KiB
{code}
 The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd 
storage on /home. So there should still be plenty of RAM have been available.
 The loc folder has at the time of crash 543GB.

I also tried with -loader=parallel and -loader=phased . Same result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to