GitHub user afs added a comment to the discussion: How to load big dataset to 
new database

> `10:57:31 INFO  loader          :: Loader = LoaderBasic`

That's the "basic" loader.
To see if there is an I/O issues, I tried a small load. In your setups, do you 
get similar results with the parallel loader?

There is `tdb2.xloader` which loads an empty dataset with the contents of 
multiple files. This loader has been used to load datasets larger than 
"compounds" (which is 6B triples). While it starts slower, it is less I/O 
intensive.

---

Loading one file of compounds files (`pc_compound2component.ttl.gz`, 27M 
triples) with parallel loader on a machine with local SSD. (Ubuntu 25.10, NVMe 
SSD,  i7-12700 : 4 P cores, 16 E cores).


Script:

```
#!/usr/bin/bash

( 
  date -Iseconds
  echo
  time tdb2.tdbloader --loader=parallel --loc ~/DatasetsSSD/DB 
~/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz
  echo
  date -Iseconds
)
```

```
2026-01-18T14:54:38+00:00

14:54:38 INFO  loader          :: Loader = LoaderParallel
14:54:38 INFO  loader          :: Start: 
/home/afs/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz
14:54:40 INFO  loader          :: Add: 1,000,000 pc_compound2component.ttl.gz 
(Batch: 596,658 / Avg: 596,658)
14:54:42 INFO  loader          :: Add: 2,000,000 pc_compound2component.ttl.gz 
(Batch: 508,388 / Avg: 548,998)
14:54:46 INFO  loader          :: Add: 3,000,000 pc_compound2component.ttl.gz 
(Batch: 265,463 / Avg: 404,858)
14:54:50 INFO  loader          :: Add: 4,000,000 pc_compound2component.ttl.gz 
(Batch: 248,077 / Avg: 349,619)
14:54:54 INFO  loader          :: Add: 5,000,000 pc_compound2component.ttl.gz 
(Batch: 234,466 / Avg: 318,349)
14:54:59 INFO  loader          :: Add: 6,000,000 pc_compound2component.ttl.gz 
(Batch: 221,778 / Avg: 296,809)
14:55:03 INFO  loader          :: Add: 7,000,000 pc_compound2component.ttl.gz 
(Batch: 230,574 / Avg: 285,109)
14:55:07 INFO  loader          :: Add: 8,000,000 pc_compound2component.ttl.gz 
(Batch: 258,464 / Avg: 281,482)
14:55:11 INFO  loader          :: Add: 9,000,000 pc_compound2component.ttl.gz 
(Batch: 247,524 / Avg: 277,255)
14:55:15 INFO  loader          :: Add: 10,000,000 pc_compound2component.ttl.gz 
(Batch: 242,541 / Avg: 273,343)
14:55:15 INFO  loader          ::   Elapsed: 36.58 seconds [2026/01/18 14:55:15 
GMT]
14:55:19 INFO  loader          :: Add: 11,000,000 pc_compound2component.ttl.gz 
(Batch: 247,402 / Avg: 270,762)
14:55:23 INFO  loader          :: Add: 12,000,000 pc_compound2component.ttl.gz 
(Batch: 247,831 / Avg: 268,690)
14:55:27 INFO  loader          :: Add: 13,000,000 pc_compound2component.ttl.gz 
(Batch: 255,232 / Avg: 267,605)
14:55:31 INFO  loader          :: Add: 14,000,000 pc_compound2component.ttl.gz 
(Batch: 276,472 / Avg: 268,219)
14:55:34 INFO  loader          :: Add: 15,000,000 pc_compound2component.ttl.gz 
(Batch: 277,623 / Avg: 268,826)
14:55:38 INFO  loader          :: Add: 16,000,000 pc_compound2component.ttl.gz 
(Batch: 284,333 / Avg: 269,746)
14:55:41 INFO  loader          :: Add: 17,000,000 pc_compound2component.ttl.gz 
(Batch: 282,485 / Avg: 270,463)
14:55:45 INFO  loader          :: Add: 18,000,000 pc_compound2component.ttl.gz 
(Batch: 268,817 / Avg: 270,371)
14:55:48 INFO  loader          :: Add: 19,000,000 pc_compound2component.ttl.gz 
(Batch: 298,240 / Avg: 271,708)
14:55:52 INFO  loader          :: Add: 20,000,000 pc_compound2component.ttl.gz 
(Batch: 302,388 / Avg: 273,093)
14:55:52 INFO  loader          ::   Elapsed: 73.24 seconds [2026/01/18 14:55:52 
GMT]
14:55:55 INFO  loader          :: Add: 21,000,000 pc_compound2component.ttl.gz 
(Batch: 291,715 / Avg: 273,926)
14:55:58 INFO  loader          :: Add: 22,000,000 pc_compound2component.ttl.gz 
(Batch: 300,300 / Avg: 275,024)
14:56:02 INFO  loader          :: Add: 23,000,000 pc_compound2component.ttl.gz 
(Batch: 304,136 / Avg: 276,173)
14:56:05 INFO  loader          :: Add: 24,000,000 pc_compound2component.ttl.gz 
(Batch: 316,155 / Avg: 277,636)
14:56:08 INFO  loader          :: Add: 25,000,000 pc_compound2component.ttl.gz 
(Batch: 288,184 / Avg: 278,043)
14:56:12 INFO  loader          :: Add: 26,000,000 pc_compound2component.ttl.gz 
(Batch: 298,507 / Avg: 278,778)
14:56:15 INFO  loader          :: Add: 27,000,000 pc_compound2component.ttl.gz 
(Batch: 276,778 / Avg: 278,703)
14:56:18 INFO  loader          :: Finished: 
/home/afs/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz:
 27,762,031 tuples in 99.10s (Avg: 280,152)
14:56:22 INFO  loader          :: Finish - index SPO
14:56:22 INFO  loader          :: Finish - index OSP
14:56:22 INFO  loader          :: Finish - index POS
14:56:22 INFO  loader          :: Time = 103.625 seconds : Triples = 27,762,031 
: Rate = 267,909 /s

real    1m44.283s
user    4m12.957s
sys     0m20.461s

2026-01-18T14:56:22+00:00
```


GitHub link: 
https://github.com/apache/jena/discussions/3701#discussioncomment-15540511

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to