GitHub user afs added a comment to the discussion: How to load big dataset to 
new database

Hi @maxx-ukoo 

> * What is the correct and fastest way to load this dataset?
> * Why does the performance drop dramatically after a few million records?

That is a surprising sharp drop off.  What kind of storage are you using? 
(local SSD? local disk? A remote file store? ...)
And what is the loader command you are using?

> * Should I unzip the files before loading them?
No need. The parsers decompress and at scale it does not seem to impact loading 
times which are dominated by writing.

> * Should I load the files one by one, or load all of them in a single 
> tdb2.tdbloader run?

All in a single run. The loaders all exploit the fact the database is empty and 
manipulate at a low level based on that. Else loading is unoptimized.


GitHub link: 
https://github.com/apache/jena/discussions/3701#discussioncomment-15505539

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to