Re: tdbloader2 OutOfMemoryException with large files

Mikhail Sogrin Fri, 28 Jan 2011 05:04:03 -0800

I had the similar issue, not running out of memory, but starting very fast
and then slowing significantly. I was using the same Ubuntu 10.10 32-bit and
loading Dbpedia data into a TDB store (using command-line loader and Java
API).


Loading instance_types_en.nt file containing just rdf:type triples (~800 MB)
was rather fast (I suppose it had enough memory for file caches), but
labels_en.nt with only rdfs:labels (~900 MB) was extremely slow - it took
about 8 hours to complete with average ~300 triples/sec. That slow loading
was caused by a lot of disk activity. For input files ~1.7 GB and resulting
TDB store ~3 GB there were about 80 GB of disk writes. Is it normal or
expected to have so much of disk writing?

Kind regards,
Mikhail

---------- Forwarded message ----------
From: <[email protected]>
To: <[email protected]>, <[email protected]>
Date: Thu, 6 Jan 2011 21:22:33 +0000
Subject: tdbloader2 OutOfMemoryException with large files
I've been taking the new tdbloader2 out for a spin with some fairly large
datasets. In total, I have about 3Billion triples I am trying to load. I
have 87 turtle files that average around 1-2GB each. I am running the job
under Ubuntu 10.10 on a quad core system with 6GB of ram. The load process
runs very vast up until about 26M triples and performance drops sharply from
about 100k down to about 400 and the it eventually runs out of memory.

I am using TDB 0.8.9. I tried to tweak the memory settings, but that only
prolongs the problem. I am assuming that 1-2GB files are a likely culprit,
but I wanted to be sure. Also, does tdbloader2 have a preference to
N-Triples over Turtle?

Ryan-

Ryan J. McDonough
Architect
Service Platforms
NOKIA INC.

Re: tdbloader2 OutOfMemoryException with large files

Reply via email to