On 05/26/2014 07:45 PM, Peng Yu wrote: >> Sort takes a divide and conquer approach, >> by sorting parts of the input to temporary files, >> and then merging the results with a bounded amount of memory. >> >> sort currently defaults to using a large memory buffer >> to minimize overhead associated with writing and reading >> temp files, so you may be seeing just this large memory >> allocation each time. >> >> The memory allocation can be controlled with --buffer-size > > If I have enough memory, is it always faster to sort without using > temp files. How to force sort always use memory only? Thanks.
Traditionally there were mainly two levels in the memory hierarchy, and so it was best to use as much RAM as possible. However given the relative increase in performance and size of processor cache compared to RAM, it can often depending on the operation be much more performant to deal with sizes that will fit within a cache (line). However as the following demonstrates, sort(1) currently seems to access RAM in a cache efficient manner, since the smaller working set sizes that would fit entirely within L3 cache on the test machine do not out perform those using larger RAM buffers. Let's do a quick test. $ shuf -i1-5000000 > file.in # generate test data $ unset MALLOC_PERTURB_ # This has a large overhead for large buffers First with a single thread as a base line. Note we put tmp files in an existing RAM disk to avoid disk latencies. $ time TMPDIR=/dev/shm sort --parallel=1 <file.in >/dev/null # uses about 200MB real 0m23.357s user 0m22.670s sys 0m0.586s So let's run again with a size smaller than my 3MB L3 cache. $ time TMPDIR=/dev/shm sort --parallel=1 -S2M < file.in> /dev/null # uses about 2MB real 0m24.033s user 0m23.808s sys 0m0.128s So much the same, the overhead probably due to the I/O to /tmp For kicks let's run again allowing it to use as much RAM as it needs, but also as much threads as appropriate for the current system. Note at the end of the process the RAM usage spikes to 500MB, but we can see significant performance increase due to the extra cores. $ time TMPDIR=/dev/shm sort <file.in >/dev/null # uses about 500MB real 0m11.671s user 0m35.567s sys 0m2.793s Now using 500MB can have significant impact on the system and sort auto sizes the mem buffer based on the current amount of free RAM, though this is not ideal given the length that sort can run. Note if you limit the RAM used with -S, then you also effectively limit the amount of threads that will be used. cheers, Pádraig.
