On 28/01/11 13:03, Mikhail Sogrin wrote:
I had the similar issue, not running out of memory, but starting very fast
and then slowing significantly. I was using the same Ubuntu 10.10 32-bit and
loading Dbpedia data into a TDB store (using command-line loader and Java
API).

Loading instance_types_en.nt file containing just rdf:type triples (~800 MB)
was rather fast (I suppose it had enough memory for file caches), but
labels_en.nt with only rdfs:labels (~900 MB) was extremely slow - it took
about 8 hours to complete with average ~300 triples/sec. That slow loading
was caused by a lot of disk activity. For input files ~1.7 GB and resulting
TDB store ~3 GB there were about 80 GB of disk writes. Is it normal or
expected to have so much of disk writing?

Kind regards,
Mikhail

Mikhail,

TDB loading, with tdbloader and tdbloader2, is rather better on 64 bit than 32 bit. On a 32 bit JVM, TDB has to do it's own disk caching, and it can only access 1.5G of RAM (Java limitation). The caching isn't going to be as sophisticated as the OS can manage; an advantage of 64bit is that cache work is devolved to the OS. That said, 300TPS is unexpected slow.

Additional, if it's a portable, portable's disk are noticeably slower than a desktop machine.

If you can load on a 64 bit machine somewhere, you can just copy the database onto the 32 bit machine. The file format is portable.

I speak in triple counts : labels_en.nt is about 8M IIRC. I don't know if the unusual data data pattern of all one property has an effect.

TDB databases are relative uncompressed - tdbloader2 creates smaller ones than the general purpose loader.

I tried labels_en.nt and it didn't really go very fast to start with so maybe there is something in the shape of the data. I'll try to find time to profile it (no promises when) - maybe there is a hotspot I'm not aware of.

Loading it here I got:
291 seconds which is 27K TPS.

(Ubuntu 10.10, 64 bit, desktop)

java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.4) (6b20-1.9.4-0ubuntu1)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

        Andy

Reply via email to