GitHub user maxx-ukoo added a comment to the discussion: How to load big dataset to new database
Hi @afs I have tested two Azure VMs with 4 (8) CPU cores and 16 (32) GB RAM, using two types of SSD disks—one with a 500 IOPS limit and another with 5000 IOPS. There is no significant performance difference between these VMs or the disk types. The database is new, but not empty. My steps are: 1. Start Fuseki; it initializes and populates the data folder with initial data. 2. Stop the Fuseki server. 3. Upload the OWL file with the command: ``` /$DATA_SSD/apache-jena/bin/apache-jena-5.6.0/bin/tdb2.tdbloader \ --loc="$DATA_SSD/server/data" \ --graph="http://rdf.ncbi.nlm.nih.gov/pubchem/ruleset" \ "$DATA_SSD/source/chebi.owl" ``` 4. Upload the data using commands like: ``` /data/apache-jena/bin/apache-jena-5.6.0/bin/tdb2.tdbloader \ --verbose \ --loader=parallel \ --loc="/data/server/data/" \ --graph="http://rdf.ncbi.nlm.nih.gov/pubchem/compound" \ "/data/source/compound/general/pc_compound2defined_atom_stereo_count_000006.ttl.gz" \ ... ``` I tried processing up to 20 files in a batch. Command run with export JAVA_TOOL_OPTIONS="-Xmx28g -Xms4g" and I see message about 30G avaulable for loader. So technically, the database is not empty: it contains 8,806,955 triples in the http://rdf.ncbi.nlm.nih.gov/pubchem/ruleset graph after loading the OWL file. I need to upload a set of 400+ files into one graph and a set of 600+ files into a second graph. I cannot upload all files using a single command because of the “Argument list too long” error. Uploading the second graph also needs to work with the existing database. Could you suggest how to upload 400+ / 600+ .ttl.gz files into two separate graphs? Uploading using Java is also fine for me if that is possible. GitHub link: https://github.com/apache/jena/discussions/3701#discussioncomment-15505921 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
