Each index sort could happen in in parallel so that's above and beyond --parallel to sort(1) which make each sort parallel.

That can also saturate the machine!

As Rob says, the memory and disk contention can kill performance - two streams of writes in parallel can be slower than one stream followed by the other.

This is a strong effect when it is to the same rotating disk but seems to be true as well for SSD to a lesser extent. I guess the bottleneck is contention for the system bus access and it is optimized for the common "no contention" case.

A high end server may have multiple independent disk paths to eliminate this.

It needs quite a lot of data before tdbloader2 overtakes the plain tdbloader. That point is sensitive to the hardware.

    Andy

On 28/10/16 15:24, Rob Vesse wrote:
If memory serves those are the phases that use POSIX sort right?

 Sort will try and do an in-memory sort as far as possible and fall back to a 
disk-based merge sort if not. Also we usually configure sort to run in parallel

If you try to process different indexing in parallel you would create a lot of 
memory and disk contention which would likely slowdown overall performance

 For sufficiently large data sets there is also a risk of exhausting disk space 
during the sort phase and building multiple indexes in parallel would only 
exacerbate this

Rob

On 28/10/2016 14:33, "A. Soroka" <[email protected]> wrote:

    I'm still learning about tdbloader2 and have another question about the 
index phase: is there any reason why the processes for the various index 
orderings (SPO, GSPO, etc.) couldn't go on in parallel? Or am I missing some 
switch or setting that already allows that?

    ---
    A. Soroka
    The University of Virginia Library






Reply via email to