Re: Testing tdb2.xloader

Andy Seaborne Sat, 18 Dec 2021 04:09:43 -0800

Hi Lorenz,

On 18/12/2021 08:09, LB wrote:

Good morning,
loading of Wikidata truthy is done, this time I didn't forget to keeplogs:https://gist.github.com/LorenzBuehmann/e3619d53cf4c158c4e4902fd7d6ed7c3
I'm a bit surprised that this time it was 8h faster than last time, 31hvs 39h.


Great!

Not sure if a) there was something else on the server last time(at least I couldn't see any running tasks) or b) if this is aconsequence of the more parallelized Unix sort now - I set it to--parallel=16
I mean, the piped input stream is single threaded I guess, but maybe thesort merge step can benefit from more threads?

yes - the sorting itself can be more parallel on a machine the size ofyou have.

Time to add a configuration file, rather than a slew of command linearguments. The file also then acts as a record of the setup.



I'm finding a new characteristic:

Loading on a smaller machine (32G RAM), I think the index sorting isrecombining temp files. That results in more I/O and higher peak diskusage. While POS is always slower, it appears to be very much slowerthan SPO.

The internet has not been very clear on the effect of "batch size" butthe GNU man page talks about "--batch-size=16". I get more than 16 tempfiles - you probably don't at this scale.

--batch-size=128 seems better -- unlikely to be a problem with thenumber of file descriptors nowadays. 16 is probably just how it always was.


On my machine: per process:

ulimit -Sn is 1024     -- ulimit current setting
ulimit -Hn is 1048576  -- ulimit max without being root.

I'll investigate when the load finishes. I'm trying not to touch themachine to avoid breaking something. It is currently doing OSP.

I guess I have to cleanup everything and run it again with the original setup with 2 Unix sortthreads ...


    Andy

Re: Testing tdb2.xloader

Reply via email to