I had the similar issue, not running out of memory, but starting very fast and then slowing significantly. I was using the same Ubuntu 10.10 32-bit and loading Dbpedia data into a TDB store (using command-line loader and Java API).
Loading instance_types_en.nt file containing just rdf:type triples (~800 MB) was rather fast (I suppose it had enough memory for file caches), but labels_en.nt with only rdfs:labels (~900 MB) was extremely slow - it took about 8 hours to complete with average ~300 triples/sec. That slow loading was caused by a lot of disk activity. For input files ~1.7 GB and resulting TDB store ~3 GB there were about 80 GB of disk writes. Is it normal or expected to have so much of disk writing? Kind regards, Mikhail ---------- Forwarded message ---------- From: <[email protected]> To: <[email protected]>, <[email protected]> Date: Thu, 6 Jan 2011 21:22:33 +0000 Subject: tdbloader2 OutOfMemoryException with large files I've been taking the new tdbloader2 out for a spin with some fairly large datasets. In total, I have about 3Billion triples I am trying to load. I have 87 turtle files that average around 1-2GB each. I am running the job under Ubuntu 10.10 on a quad core system with 6GB of ram. The load process runs very vast up until about 26M triples and performance drops sharply from about 100k down to about 400 and the it eventually runs out of memory. I am using TDB 0.8.9. I tried to tweak the memory settings, but that only prolongs the problem. I am assuming that 1-2GB files are a likely culprit, but I wanted to be sure. Also, does tdbloader2 have a preference to N-Triples over Turtle? Ryan- Ryan J. McDonough Architect Service Platforms NOKIA INC.
