I've been taking the new tdbloader2 out for a spin with some fairly large datasets. In total, I have about 3Billion triples I am trying to load. I have 87 turtle files that average around 1-2GB each. I am running the job under Ubuntu 10.10 on a quad core system with 6GB of ram. The load process runs very vast up until about 26M triples and performance drops sharply from about 100k down to about 400 and the it eventually runs out of memory.
I am using TDB 0.8.9. I tried to tweak the memory settings, but that only prolongs the problem. I am assuming that 1-2GB files are a likely culprit, but I wanted to be sure. Also, does tdbloader2 have a preference to N-Triples over Turtle? Ryan- Ryan J. McDonough Architect Service Platforms NOKIA INC.
