Hello,

Sort's memory usage (specifically, sort_buffer_size() ) has been discussed few 
times before, but I couldn't find mention of the following issue:

If given a regular input file, sort tries to guesstimate the optimal buffer 
size based on the file size.
But this value is calculated for one thread (before sort got multi-threaded).
The default "--parallel" value is 8 (or less, if fewer cores are available) - 
which requires more memory.

The result is, that for a somewhat powerful machine (e.g. 128GB RAM, 32 cores - 
not uncommon for a computer cluster),
sorting a big file (e.g 10GB) will always allocate too little memory, and will 
always resort to saving temporary files on "/tmp".
The disk activity will result in slower sorting times than what could be done 
in an all-memory sort.

Based on this: 
http://lists.gnu.org/archive/html/coreutils/2010-12/msg00084.html ,
perhaps it would be beneficial to consider the number of threads in the memory 
allocation ?

Regards,
 -gordon

Reply via email to