Re: another naive question about tdbloader2

Andy Seaborne Fri, 28 Oct 2016 07:54:33 -0700

Each index sort could happen in in parallel so that's above and beyond--parallel to sort(1) which make each sort parallel.


That can also saturate the machine!

As Rob says, the memory and disk contention can kill performance - twostreams of writes in parallel can be slower than one stream followed bythe other.

This is a strong effect when it is to the same rotating disk but seemsto be true as well for SSD to a lesser extent. I guess the bottleneckis contention for the system bus access and it is optimized for thecommon "no contention" case.

A high end server may have multiple independent disk paths to eliminatethis.

It needs quite a lot of data before tdbloader2 overtakes the plaintdbloader. That point is sensitive to the hardware.


    Andy

On 28/10/16 15:24, Rob Vesse wrote:

If memory serves those are the phases that use POSIX sort right?

 Sort will try and do an in-memory sort as far as possible and fall back to a 
disk-based merge sort if not. Also we usually configure sort to run in parallel

If you try to process different indexing in parallel you would create a lot of 
memory and disk contention which would likely slowdown overall performance

 For sufficiently large data sets there is also a risk of exhausting disk space 
during the sort phase and building multiple indexes in parallel would only 
exacerbate this

Rob

On 28/10/2016 14:33, "A. Soroka" <[email protected]> wrote:

    I'm still learning about tdbloader2 and have another question about the 
index phase: is there any reason why the processes for the various index 
orderings (SPO, GSPO, etc.) couldn't go on in parallel? Or am I missing some 
switch or setting that already allows that?

    ---
    A. Soroka
    The University of Virginia Library

Re: another naive question about tdbloader2

Reply via email to