GitHub user maxx-ukoo added a comment to the discussion: How to load big 
dataset to new database

Hi @afs 
I have tested two Azure VMs with 4 (8) CPU cores and 16 (32) GB RAM, using two 
types of SSD disks—one with a 500 IOPS limit and another with 5000 IOPS. There 
is no significant performance difference between these VMs or the disk types.
The database is new, but not empty. My steps are:

1. Start Fuseki; it initializes and populates the data folder with initial data.
2. Stop the Fuseki server.
3. Upload the OWL file with the command:

```
/$DATA_SSD/apache-jena/bin/apache-jena-5.6.0/bin/tdb2.tdbloader \
  --loc="$DATA_SSD/server/data" \
  --graph="http://rdf.ncbi.nlm.nih.gov/pubchem/ruleset"; \
  "$DATA_SSD/source/chebi.owl"
  ```
4. Upload the data using commands like:

  ```
   /data/apache-jena/bin/apache-jena-5.6.0/bin/tdb2.tdbloader \
    --verbose \
    --loader=parallel \
    --loc="/data/server/data/" \
    --graph="http://rdf.ncbi.nlm.nih.gov/pubchem/compound"; \
"/data/source/compound/general/pc_compound2defined_atom_stereo_count_000006.ttl.gz"
 \
 ...
  ```
I tried processing up to 20 files in a batch.
Command run with export JAVA_TOOL_OPTIONS="-Xmx28g -Xms4g" and I see message 
about 30G avaulable for loader.

So technically, the database is not empty: it contains 8,806,955 triples in the 
http://rdf.ncbi.nlm.nih.gov/pubchem/ruleset graph after loading the OWL file.
I need to upload a set of 400+ files into one graph and a set of 600+ files 
into a second graph. I cannot upload all files using a single command because 
of the “Argument list too long” error. Uploading the second graph also needs to 
work with the existing database.
Could you suggest how to upload 400+ / 600+ .ttl.gz files into two separate 
graphs? Uploading using Java is also fine for me if that is possible.

GitHub link: 
https://github.com/apache/jena/discussions/3701#discussioncomment-15505921

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to