GitHub user afs added a comment to the discussion: How to load big dataset to new database
Hi @maxx-ukoo > * What is the correct and fastest way to load this dataset? > * Why does the performance drop dramatically after a few million records? That is a surprising sharp drop off. What kind of storage are you using? (local SSD? local disk? A remote file store? ...) And what is the loader command you are using? > * Should I unzip the files before loading them? No need. The parsers decompress and at scale it does not seem to impact loading times which are dominated by writing. > * Should I load the files one by one, or load all of them in a single > tdb2.tdbloader run? All in a single run. The loaders all exploit the fact the database is empty and manipulate at a low level based on that. Else loading is unoptimized. GitHub link: https://github.com/apache/jena/discussions/3701#discussioncomment-15505539 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
