Hi, Do any patches exist to fork the merging stage of sort and run multiple merge processes in parallel? It seems like a relatively straight forward improvement, especially since a lot of the fork/wait magic has already been tackled by --compress-program. I wonder what the optimal --batch-size would be; NMERGE=2 would be the most parallel, but would require more I/O.
Does anyone here know the effect of the CPU cache size on the optimal --buffer-size? I was wondering if it's possible that setting it to the CPU cache size (say 8 MB) could possible be faster than a larger buffer. Cheers, Shaun
