Each index sort could happen in in parallel so that's above and beyond
--parallel to sort(1) which make each sort parallel.
That can also saturate the machine!
As Rob says, the memory and disk contention can kill performance - two
streams of writes in parallel can be slower than one stream followed by
the other.
This is a strong effect when it is to the same rotating disk but seems
to be true as well for SSD to a lesser extent. I guess the bottleneck
is contention for the system bus access and it is optimized for the
common "no contention" case.
A high end server may have multiple independent disk paths to eliminate
this.
It needs quite a lot of data before tdbloader2 overtakes the plain
tdbloader. That point is sensitive to the hardware.
Andy
On 28/10/16 15:24, Rob Vesse wrote:
If memory serves those are the phases that use POSIX sort right?
Sort will try and do an in-memory sort as far as possible and fall back to a
disk-based merge sort if not. Also we usually configure sort to run in parallel
If you try to process different indexing in parallel you would create a lot of
memory and disk contention which would likely slowdown overall performance
For sufficiently large data sets there is also a risk of exhausting disk space
during the sort phase and building multiple indexes in parallel would only
exacerbate this
Rob
On 28/10/2016 14:33, "A. Soroka" <[email protected]> wrote:
I'm still learning about tdbloader2 and have another question about the
index phase: is there any reason why the processes for the various index
orderings (SPO, GSPO, etc.) couldn't go on in parallel? Or am I missing some
switch or setting that already allows that?
---
A. Soroka
The University of Virginia Library