GitHub user maxx-ukoo added a comment to the discussion: How to load big 
dataset to new database

Currently, I have a database with 35 files loaded.
The database has size approximately 400+ GB, and it contains 395,085,113 
triples.
The parallel loader works very slowly on this database:
```
2026-01-19T16:37:26+00:00

Picked up JAVA_TOOL_OPTIONS: -Xmx8g -Xms8g
Java maximum memory: 8589934592
symbol:http://jena.apache.org/ARQ#regexImpl = 
symbol:http://jena.apache.org/ARQ#javaRegex
symbol:http://jena.apache.org/ARQ#registryFunctions = 
org.apache.jena.sparql.function.FunctionRegistry@60641ec8
symbol:http://jena.apache.org/ARQ#constantBNodeLabels = true
symbol:http://jena.apache.org/ARQ#registryPropertyFunctions = 
org.apache.jena.sparql.pfunction.PropertyFunctionRegistry@75f65e45
symbol:http://jena.apache.org/ARQ#stageGenerator = 
org.apache.jena.tdb2.solver.StageGeneratorDirectTDB@6eeade6c
symbol:http://jena.apache.org/ARQ#enablePropertyFunctions = true
symbol:http://jena.apache.org/ARQ#registryServiceExecutors = 
org.apache.jena.sparql.service.ServiceExecutorRegistry@4a891c97
symbol:http://jena.apache.org/ARQ#strictSPARQL = false
16:37:26 INFO  loader          :: Loader = LoaderParallel
16:37:26 INFO  loader          :: Start: 
/data/source/compound/general/pc_compound2component.ttl.gz
16:37:29 INFO  loader          :: Add: 1,000,000 pc_compound2component.ttl.gz 
(Batch: 344,234 / Avg: 344,234)
16:47:34 INFO  loader          :: Add: 2,000,000 pc_compound2component.ttl.gz 
(Batch: 1,652 / Avg: 3,288)
17:00:16 INFO  loader          :: Add: 3,000,000 pc_compound2component.ttl.gz 
(Batch: 1,313 / Avg: 2,190)
17:24:05 INFO  loader          :: Add: 4,000,000 pc_compound2component.ttl.gz 
(Batch: 699 / Avg: 1,429)
18:21:36 INFO  loader          :: Add: 5,000,000 pc_compound2component.ttl.gz 
(Batch: 289 / Avg: 800)
19:21:23 INFO  loader          :: Add: 6,000,000 pc_compound2component.ttl.gz 
(Batch: 278 / Avg: 609)
20:31:19 INFO  loader          :: Add: 7,000,000 pc_compound2component.ttl.gz 
(Batch: 238 / Avg: 498)
21:59:10 INFO  loader          :: Add: 8,000,000 pc_compound2component.ttl.gz 
(Batch: 189 / Avg: 414)
23:34:39 INFO  loader          :: Add: 9,000,000 pc_compound2component.ttl.gz 
(Batch: 174 / Avg: 359)
02:07:46 INFO  loader          :: Add: 10,000,000 pc_compound2component.ttl.gz 
(Batch: 108 / Avg: 292)
02:07:46 INFO  loader          ::   Elapsed: 34,219.43 seconds [2026/01/20 
02:07:46 UTC]
05:36:22 INFO  loader          :: Add: 11,000,000 pc_compound2component.ttl.gz 
(Batch: 79 / Avg: 235)
^C
```
However, when I run the loader on a clean (empty) database folder on the same 
hardware, it works much faster:
```
Picked up JAVA_TOOL_OPTIONS: -Xmx8g -Xms8g
Java maximum memory: 8589934592
symbol:http://jena.apache.org/ARQ#regexImpl = 
symbol:http://jena.apache.org/ARQ#javaRegex
symbol:http://jena.apache.org/ARQ#registryFunctions = 
org.apache.jena.sparql.function.FunctionRegistry@60641ec8
symbol:http://jena.apache.org/ARQ#constantBNodeLabels = true
symbol:http://jena.apache.org/ARQ#registryPropertyFunctions = 
org.apache.jena.sparql.pfunction.PropertyFunctionRegistry@75f65e45
symbol:http://jena.apache.org/ARQ#stageGenerator = 
org.apache.jena.tdb2.solver.StageGeneratorDirectTDB@6eeade6c
symbol:http://jena.apache.org/ARQ#enablePropertyFunctions = true
symbol:http://jena.apache.org/ARQ#registryServiceExecutors = 
org.apache.jena.sparql.service.ServiceExecutorRegistry@4a891c97
symbol:http://jena.apache.org/ARQ#strictSPARQL = false
07:54:19 INFO  loader          :: Loader = LoaderParallel
07:54:19 INFO  loader          :: Start: 
/data/source/compound/general/pc_compound2component.ttl.gz
07:54:22 INFO  loader          :: Add: 1,000,000 pc_compound2component.ttl.gz 
(Batch: 339,097 / Avg: 339,097)
07:54:28 INFO  loader          :: Add: 2,000,000 pc_compound2component.ttl.gz 
(Batch: 171,438 / Avg: 227,738)
07:54:38 INFO  loader          :: Add: 3,000,000 pc_compound2component.ttl.gz 
(Batch: 104,657 / Avg: 163,603)
07:54:48 INFO  loader          :: Add: 4,000,000 pc_compound2component.ttl.gz 
(Batch: 101,719 / Avg: 142,005)
07:54:58 INFO  loader          :: Add: 5,000,000 pc_compound2component.ttl.gz 
(Batch: 91,996 / Avg: 128,080)
07:55:10 INFO  loader          :: Add: 6,000,000 pc_compound2component.ttl.gz 
(Batch: 85,251 / Avg: 118,184)
07:55:22 INFO  loader          :: Add: 7,000,000 pc_compound2component.ttl.gz 
(Batch: 84,709 / Avg: 111,869)
07:55:32 INFO  loader          :: Add: 8,000,000 pc_compound2component.ttl.gz 
(Batch: 98,911 / Avg: 110,067)
07:55:43 INFO  loader          :: Add: 9,000,000 pc_compound2component.ttl.gz 
(Batch: 90,009 / Avg: 107,407)
07:55:55 INFO  loader          :: Add: 10,000,000 pc_compound2component.ttl.gz 
(Batch: 86,550 / Avg: 104,880)
07:55:55 INFO  loader          ::   Elapsed: 95.35 seconds [2026/01/20 07:55:55 
UTC]
07:56:05 INFO  loader          :: Add: 11,000,000 pc_compound2component.ttl.gz 
(Batch: 96,413 / Avg: 104,049)
07:56:15 INFO  loader          :: Add: 12,000,000 pc_compound2component.ttl.gz 
(Batch: 103,241 / Avg: 103,981)
07:56:24 INFO  loader          :: Add: 13,000,000 pc_compound2component.ttl.gz 
(Batch: 109,301 / Avg: 104,372)
07:56:33 INFO  loader          :: Add: 14,000,000 pc_compound2component.ttl.gz 
(Batch: 110,132 / Avg: 104,763)
07:56:41 INFO  loader          :: Add: 15,000,000 pc_compound2component.ttl.gz 
(Batch: 118,245 / Avg: 105,566)
07:56:50 INFO  loader          :: Add: 16,000,000 pc_compound2component.ttl.gz 
(Batch: 116,604 / Avg: 106,194)
07:56:58 INFO  loader          :: Add: 17,000,000 pc_compound2component.ttl.gz 
(Batch: 122,070 / Avg: 107,013)
07:57:07 INFO  loader          :: Add: 18,000,000 pc_compound2component.ttl.gz 
(Batch: 120,279 / Avg: 107,672)
07:57:15 INFO  loader          :: Add: 19,000,000 pc_compound2component.ttl.gz 
(Batch: 123,046 / Avg: 108,385)
07:57:23 INFO  loader          :: Add: 20,000,000 pc_compound2component.ttl.gz 
(Batch: 120,729 / Avg: 108,942)
07:57:23 INFO  loader          ::   Elapsed: 183.58 seconds [2026/01/20 
07:57:23 UTC]
07:57:31 INFO  loader          :: Add: 21,000,000 pc_compound2component.ttl.gz 
(Batch: 118,891 / Avg: 109,378)
07:57:39 INFO  loader          :: Add: 22,000,000 pc_compound2component.ttl.gz 
(Batch: 123,777 / Avg: 109,959)
07:57:48 INFO  loader          :: Add: 23,000,000 pc_compound2component.ttl.gz 
(Batch: 122,204 / Avg: 110,440)
07:57:55 INFO  loader          :: Add: 24,000,000 pc_compound2component.ttl.gz 
(Batch: 130,633 / Avg: 111,156)
07:58:05 INFO  loader          :: Add: 25,000,000 pc_compound2component.ttl.gz 
(Batch: 108,201 / Avg: 111,035)
07:58:13 INFO  loader          :: Add: 26,000,000 pc_compound2component.ttl.gz 
(Batch: 113,778 / Avg: 111,138)
07:58:23 INFO  loader          :: Add: 27,000,000 pc_compound2component.ttl.gz 
(Batch: 107,781 / Avg: 111,010)
07:58:29 INFO  loader          :: Finished: 
/data/source/compound/general/pc_compound2component.ttl.gz: 27,762,031 tuples 
in 249.16s (Avg: 111,421)
07:58:44 INFO  loader          :: Finish - index SPOG
07:58:44 INFO  loader          :: Finish - index GSPO
07:58:53 INFO  loader          :: Finish - index GOSP
07:58:54 INFO  loader          :: Finish - index GPOS
07:58:54 INFO  loader          :: Finish - index POSG
07:58:54 INFO  loader          :: Finish - index OSPG
07:58:54 INFO  loader          :: Time = 274.513 seconds : Quads = 27,762,031 : 
Rate = 101,132 /s
2026-01-20T07:58:55+00:00
``` 

The performance degradation becomes noticeable when the database size reaches 
around 300–400 GB.
I see high disk usage in this case:
<img width="3118" height="216" alt="image" 
src="https://github.com/user-attachments/assets/6cd0798f-5583-4725-926e-4d068473e3d3";
 />



GitHub link: 
https://github.com/apache/jena/discussions/3701#discussioncomment-15548009

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to