On 12-02-29 08:09 AM, Andy Seaborne wrote:
On 29/02/12 11:34, Sarven Capadisli wrote:
On 12-02-29 06:26 AM, Sarven Capadisli wrote:
On 12-02-29 05:09 AM, Damian Steer wrote:
At a guess, other stuff happening on the same host? A batch might
include a sync to disk too. I wouldn't have thought GC would be an
issue.

Not to my knowledge. I get the feeling that the disk falls asleep.
Hence, I'm investing with what I have right now.

On that note, actually what I find absurd is that, if I want to get
tdbloader back to action (to work faster), I do some large disk writing
on another screen window. This was an accidental find, and I don't have
a technical explanation for it. Somehow that causes the Batch numbers go
up to 20000+, where they may have been stuck below 1000s.

-Sarven

Interesting but I'm not completely shocked.

The batch speed (yes, triples per second for the last time interval)
tends to shoot up at the start (JIT presumably), hit some peak, then
very slowly decline. With exceptions. Sometimes it declines for a bit,
then starts going faster even on a machine that is doing nothing else,
which is a bit odd.

I think the occasional one-off drop in batch is a major, non-incremental
GC happening.

The "doing work elsewhere" makes it go faster might be because the OS is
knocked into a more efficient policy for the disk cache but I'm guessing
here.

Add: 4,150,000 triples (Batch: 2,380 / Avg: 4,684)
Add: 4,200,000 triples (Batch: 29,620 / Avg: 4,732)

That's pretty slow.

Usual questions:
How much data overall?

I don't have a triple count right now, but about 35GB of 5100 or so RDF/XML files in different sizes.

Many long literals? Other unusual data features?

Not that many literals even. And they are usually short.

What's the machine?

Ubuntu x86_64 GNU/Linux
Memory: 16 GB
Disk swap: 16 GB
Filesystem: ext4

An incremental version is quite possible. It could load to a dataset,
ensuring the id are right, then do index-merging.

I was loading them incrementally, then I merged a good chunk of the files into N-Triples and tried importing with them. It seems to go slightly better but it is hard to tell for sure.


Thanks Andy!

-Sarven

Reply via email to