[
https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284645#comment-17284645
]
Alexander Bigerl commented on JENA-2044:
----------------------------------------
[~andy] Thank you for your suggestions.
I will try to find appropriate environment settings and post my results here.
Currently, the machine I was running the loading on is occupied with other
workload. I will probably work on this around mid/end of this week then.
> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
> Key: JENA-2044
> URL: https://issues.apache.org/jira/browse/JENA-2044
> Project: Apache Jena
> Issue Type: Bug
> Components: Cmd line tools, Jena, TDB2
> Affects Versions: Jena 3.14.0, Jena 3.17.0
> Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode,
> sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
> {code}
> Reporter: Alexander Bigerl
> Priority: Major
> Attachments: hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not
> available any more via wikidata, but a backup can be found here:
> [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
> --loader=sequential --loc
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
>
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
> 2>&1 | tee
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO loader :: Add: 729,000,000 SPO->OSP (Batch: 237,642 /
> Avg: 300,050)
> 06:56:57 INFO loader :: Add: 730,000,000 SPO->OSP (Batch: 234,962 /
> Avg: 299,936)
> 06:56:57 INFO loader :: Elapsed: 2,433.85 seconds [2021/02/12
> 06:56:57 CET]
> 06:57:01 INFO loader :: Add: 731,000,000 SPO->OSP (Batch: 233,863 /
> Avg: 299,820)
> 06:57:05 INFO loader :: Add: 732,000,000 SPO->OSP (Batch: 269,978 /
> Avg: 299,775)
> 06:57:08 INFO loader :: Add: 733,000,000 SPO->OSP (Batch: 281,373 /
> Avg: 299,748)
> 06:57:12 INFO loader :: Add: 734,000,000 SPO->OSP (Batch: 285,143 /
> Avg: 299,727)
> 06:57:15 INFO loader :: Add: 735,000,000 SPO->OSP (Batch: 290,023 /
> Avg: 299,714)
> 06:57:19 INFO loader :: Add: 736,000,000 SPO->OSP (Batch: 290,951 /
> Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for
> AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS : 534037652 KiB
> Recursive and acc. high-water RSS+CACHE : 585081976 KiB
> {code}
> The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd
> storage on /home. So there should still be plenty of RAM have been available.
> The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)