GitHub user afs added a comment to the discussion: How to load big dataset to new database
> `10:57:31 INFO loader :: Loader = LoaderBasic` That's the "basic" loader. To see if there is an I/O issues, I tried a small load. In your setups, do you get similar results with the parallel loader? There is `tdb2.xloader` which loads an empty dataset with the contents of multiple files. This loader has been used to load datasets larger than "compounds" (which is 6B triples). While it starts slower, it is less I/O intensive. --- Loading one file of compounds files (`pc_compound2component.ttl.gz`, 27M triples) with parallel loader on a machine with local SSD. (Ubuntu 25.10, NVMe SSD, i7-12700 : 4 P cores, 16 E cores). Script: ``` #!/usr/bin/bash ( date -Iseconds echo time tdb2.tdbloader --loader=parallel --loc ~/DatasetsSSD/DB ~/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz echo date -Iseconds ) ``` ``` 2026-01-18T14:54:38+00:00 14:54:38 INFO loader :: Loader = LoaderParallel 14:54:38 INFO loader :: Start: /home/afs/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz 14:54:40 INFO loader :: Add: 1,000,000 pc_compound2component.ttl.gz (Batch: 596,658 / Avg: 596,658) 14:54:42 INFO loader :: Add: 2,000,000 pc_compound2component.ttl.gz (Batch: 508,388 / Avg: 548,998) 14:54:46 INFO loader :: Add: 3,000,000 pc_compound2component.ttl.gz (Batch: 265,463 / Avg: 404,858) 14:54:50 INFO loader :: Add: 4,000,000 pc_compound2component.ttl.gz (Batch: 248,077 / Avg: 349,619) 14:54:54 INFO loader :: Add: 5,000,000 pc_compound2component.ttl.gz (Batch: 234,466 / Avg: 318,349) 14:54:59 INFO loader :: Add: 6,000,000 pc_compound2component.ttl.gz (Batch: 221,778 / Avg: 296,809) 14:55:03 INFO loader :: Add: 7,000,000 pc_compound2component.ttl.gz (Batch: 230,574 / Avg: 285,109) 14:55:07 INFO loader :: Add: 8,000,000 pc_compound2component.ttl.gz (Batch: 258,464 / Avg: 281,482) 14:55:11 INFO loader :: Add: 9,000,000 pc_compound2component.ttl.gz (Batch: 247,524 / Avg: 277,255) 14:55:15 INFO loader :: Add: 10,000,000 pc_compound2component.ttl.gz (Batch: 242,541 / Avg: 273,343) 14:55:15 INFO loader :: Elapsed: 36.58 seconds [2026/01/18 14:55:15 GMT] 14:55:19 INFO loader :: Add: 11,000,000 pc_compound2component.ttl.gz (Batch: 247,402 / Avg: 270,762) 14:55:23 INFO loader :: Add: 12,000,000 pc_compound2component.ttl.gz (Batch: 247,831 / Avg: 268,690) 14:55:27 INFO loader :: Add: 13,000,000 pc_compound2component.ttl.gz (Batch: 255,232 / Avg: 267,605) 14:55:31 INFO loader :: Add: 14,000,000 pc_compound2component.ttl.gz (Batch: 276,472 / Avg: 268,219) 14:55:34 INFO loader :: Add: 15,000,000 pc_compound2component.ttl.gz (Batch: 277,623 / Avg: 268,826) 14:55:38 INFO loader :: Add: 16,000,000 pc_compound2component.ttl.gz (Batch: 284,333 / Avg: 269,746) 14:55:41 INFO loader :: Add: 17,000,000 pc_compound2component.ttl.gz (Batch: 282,485 / Avg: 270,463) 14:55:45 INFO loader :: Add: 18,000,000 pc_compound2component.ttl.gz (Batch: 268,817 / Avg: 270,371) 14:55:48 INFO loader :: Add: 19,000,000 pc_compound2component.ttl.gz (Batch: 298,240 / Avg: 271,708) 14:55:52 INFO loader :: Add: 20,000,000 pc_compound2component.ttl.gz (Batch: 302,388 / Avg: 273,093) 14:55:52 INFO loader :: Elapsed: 73.24 seconds [2026/01/18 14:55:52 GMT] 14:55:55 INFO loader :: Add: 21,000,000 pc_compound2component.ttl.gz (Batch: 291,715 / Avg: 273,926) 14:55:58 INFO loader :: Add: 22,000,000 pc_compound2component.ttl.gz (Batch: 300,300 / Avg: 275,024) 14:56:02 INFO loader :: Add: 23,000,000 pc_compound2component.ttl.gz (Batch: 304,136 / Avg: 276,173) 14:56:05 INFO loader :: Add: 24,000,000 pc_compound2component.ttl.gz (Batch: 316,155 / Avg: 277,636) 14:56:08 INFO loader :: Add: 25,000,000 pc_compound2component.ttl.gz (Batch: 288,184 / Avg: 278,043) 14:56:12 INFO loader :: Add: 26,000,000 pc_compound2component.ttl.gz (Batch: 298,507 / Avg: 278,778) 14:56:15 INFO loader :: Add: 27,000,000 pc_compound2component.ttl.gz (Batch: 276,778 / Avg: 278,703) 14:56:18 INFO loader :: Finished: /home/afs/Datasets/PubChem/Data-2026-01/compound/general/pc_compound2component.ttl.gz: 27,762,031 tuples in 99.10s (Avg: 280,152) 14:56:22 INFO loader :: Finish - index SPO 14:56:22 INFO loader :: Finish - index OSP 14:56:22 INFO loader :: Finish - index POS 14:56:22 INFO loader :: Time = 103.625 seconds : Triples = 27,762,031 : Rate = 267,909 /s real 1m44.283s user 4m12.957s sys 0m20.461s 2026-01-18T14:56:22+00:00 ``` GitHub link: https://github.com/apache/jena/discussions/3701#discussioncomment-15540511 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
