[ 
https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bigerl updated JENA-2044:
-----------------------------------
    Attachment: 
fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log

> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
>                 Key: JENA-2044
>                 URL: https://issues.apache.org/jira/browse/JENA-2044
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools, Jena, TDB2
>    Affects Versions: Jena 3.14.0, Jena 3.17.0
>         Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, 
> sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID:       Debian
> Description:  Debian GNU/Linux 10 (buster)
> Release:      10
> Codename:     buster
> {code}
>            Reporter: Alexander Bigerl
>            Priority: Major
>         Attachments: FusekiWikidataLoaderDockerfile, 
> fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log, 
> hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not 
> available any more via wikidata, but a backup can be found here: 
> [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
>  --loader=sequential --loc 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
>  
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
>  2>&1 | tee 
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>  
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO  loader          :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / 
> Avg: 300,050)
> 06:56:57 INFO  loader          :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / 
> Avg: 299,936)
> 06:56:57 INFO  loader          ::   Elapsed: 2,433.85 seconds [2021/02/12 
> 06:56:57 CET]
> 06:57:01 INFO  loader          :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / 
> Avg: 299,820)
> 06:57:05 INFO  loader          :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / 
> Avg: 299,775)
> 06:57:08 INFO  loader          :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / 
> Avg: 299,748)
> 06:57:12 INFO  loader          :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / 
> Avg: 299,727)
> 06:57:15 INFO  loader          :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / 
> Avg: 299,714)
> 06:57:19 INFO  loader          :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / 
> Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for 
> AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS                    :  534037652 KiB
> Recursive and acc. high-water RSS+CACHE :  585081976 KiB
> {code}
>  The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd 
> storage on /home. So there should still be plenty of RAM have been available.
>  The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.
> edit 23/Feb/2021:
> ulimit -a
> {code:bash}
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 4110480
> max locked memory       (kbytes, -l) 65536
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 4110480
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> {code}
> output from prlimit --pid=<PID of tdb.loader2>:
> {code:bash}
> RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
> AS         address space limit                unlimited unlimited bytes
> CORE       max core file size                         0 unlimited bytes
> CPU        CPU time                           unlimited unlimited seconds
> DATA       max data size                      unlimited unlimited bytes
> FSIZE      max file size                      unlimited unlimited bytes
> LOCKS      max number of file locks held      unlimited unlimited locks
> MEMLOCK    max locked-in-memory address space  67108864  67108864 bytes
> MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
> NICE       max nice prio allowed to raise             0         0 
> NOFILE     max number of open files             1048576   1048576 files
> NPROC      max number of processes              4110480   4110480 processes
> RSS        max resident set size              unlimited unlimited bytes
> RTPRIO     max real-time priority                     0         0 
> RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
> SIGPENDING max number of pending signals        4110480   4110480 signals
> STACK      max stack size                       8388608 unlimited bytes
> {code}
> I tried several things. Nothing worked so far. Here is what I tried:
> * Setting several JVM_ARGS to avoid resource over-consumption: 
> {code:bash}
> -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184 
> -XX:ActiveProcessorCount=4
> {code}
> * Limiting : 
> {code:bash}
> -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184 
> -XX:ActiveProcessorCount=4
> {code}
> * limiting kernel shared memory to (set in /etc/sysctl.conf + reboot)
> {code:bash}
> kernel.shmmax=17179869185 # 16 GB
> kernel.shmall=4194304 # *4096 pagesize = 16 GB
> {code}
> * putting jena into a dockerfile and run it with the commands below. The idea 
> was to limit the resources and have a default environment. I first tried with 
> 16g memory but it within a couple 100M triples dropping below 10k triples/s 
> loading speed. So I increased the reseources because I didn't have that much 
> time on the server. Dockerfile is provided as attachment as well as the 
> output. Here the output might be interesting. Because tdb2.loader did not 
> crash completely but simply logged an Exception and idled ever since.
> {code:bash}
> docker build -f FusekiWikidataLoaderDockerfile -t fuseki-wikidata_01 
> empty_folder/
> dockerun --user $(id -u):$(id -g) --kernel-memory=256g --memory=256g 
> --cpuset-cpus=0-7 --name fuseki_wikidata_8c_J16g_R256g_K256g_phased -v 
> "$(pwd)"/databases:/databases -v "$(pwd)"/datasets:/datasets -v 
> "$(pwd)"/logs:/logs fuseki-wikidata_01:latest
> {code}
> Side observation:
> * Fusekis loading performance seems to be influenced by linux' cache 
> significantly. If there is 1TB available it had had a read rate of ~2.5GB/s 
> (measured with htop). With only 16GB it dropped to at max 250MB/s. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to