On Wed, 2010-02-17 at 14:57 -0800, Chen Guo wrote: > > > As for buffer size, I highly doubt using 8 mb, even if we're magically > > > guaranteed to get 100% of the cpu cache, would work better than a larger > > > buffer. > > > > > > The main reason would be for larger files, you'd have to repeatedly > > > write > > > temporary files out to disk, then merge those temporary files. Whatever > > > time you save talking to cache is more than lost to the extra time talking > > > to disk. > > > > What if the temporary files were stored in RAM (i.e. tmpfs) rather than > > on magnetic disk? > > I think I'm misunderstanding what you're trying to say... But the file stored > in ram would be in a buffer. --buffer-size sets the size of this buffer, i.e. > how > much space in RAM you want to allocate to sort.
I'm suggesting setting the buffer size to the size of the CPU cache; the sort process has 100% CPU affinity, i.e. no other processes allowed on that CPU and so exclusive use of the data cache; and the temporary directory is mounted on RAM (i.e. tmpfs) and not magnetic disk. sort --buffer-size=8M --temporary-directory=/dev/shm If the merging is parallel, under these circumstances, is it possible that --buffer-size=8M could be faster than a larger value. Cheers, Shaun
