[ 
https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284645#comment-17284645
 ] 

Alexander Bigerl commented on JENA-2044:
----------------------------------------

[~andy] Thank you for your suggestions.
I will try to find appropriate environment settings and post my results here. 
Currently, the machine I was running the loading on is occupied with other 
workload. I will probably work on this around mid/end of this week then. 

> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
>                 Key: JENA-2044
>                 URL: https://issues.apache.org/jira/browse/JENA-2044
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools, Jena, TDB2
>    Affects Versions: Jena 3.14.0, Jena 3.17.0
>         Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, 
> sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID:       Debian
> Description:  Debian GNU/Linux 10 (buster)
> Release:      10
> Codename:     buster
> {code}
>            Reporter: Alexander Bigerl
>            Priority: Major
>         Attachments: hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not 
> available any more via wikidata, but a backup can be found here: 
> [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
>  --loader=sequential --loc 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
>  
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
>  2>&1 | tee 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>  
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO  loader          :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / 
> Avg: 300,050)
> 06:56:57 INFO  loader          :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / 
> Avg: 299,936)
> 06:56:57 INFO  loader          ::   Elapsed: 2,433.85 seconds [2021/02/12 
> 06:56:57 CET]
> 06:57:01 INFO  loader          :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / 
> Avg: 299,820)
> 06:57:05 INFO  loader          :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / 
> Avg: 299,775)
> 06:57:08 INFO  loader          :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / 
> Avg: 299,748)
> 06:57:12 INFO  loader          :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / 
> Avg: 299,727)
> 06:57:15 INFO  loader          :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / 
> Avg: 299,714)
> 06:57:19 INFO  loader          :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / 
> Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for 
> AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS                    :  534037652 KiB
> Recursive and acc. high-water RSS+CACHE :  585081976 KiB
> {code}
>  The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd 
> storage on /home. So there should still be plenty of RAM have been available.
>  The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to