On Thu, Mar 26, 2009 at 09:50:08PM +0000, Ralf Wildenhues wrote: > Hi Paul, all, > > Paul Eggert writes: > > > > This patch is by Glen Lenker, Matt Pham, Benjamin Nuernberger, Sky > > Lin, TaeSung Roh, and Paul Eggert. It adds support for parallelism > > within an internal sort. On our simple tests on a 2-core desktop x86, > > overall performance improved by roughly a factor of 1.6. > > This is too interesting to pass up. > > Example run, on an 8-way, and with cat'ed instances of the dictionary, > on tmpfs, timings best of three:
Hey Ralf, did you happen to specify the amount of RAM sort should use. Not specifying enough RAM for sort would force break what would be a single internal sort into multiple internal sort passes and then an external sort. As it is external sort is still sequential. > runtime [s] threads > file size [MB] 1 2 4 8 > 1 0.06 0.04 0.03 0.04 > 2 0.13 0.09 0.07 0.07 > 4 0.28 0.20 0.16 0.15 > 8 0.61 0.43 0.34 0.32 > 16 1.34 0.94 0.74 0.72 > 32 3.00 2.06 1.63 1.57 > 64 6.36 4.38 3.44 3.32 > 128 13.49 9.30 7.13 7.24 > 256 28.62 19.49 15.17 15.18 I ran these tests on a 256MB instance of the dictionary in tmpfs, on a 8-core machine specifying 2G of RAM. runtime [s] threads file size [MB] 1 2 4 8 256 2m41.219 1m27.357 52.53 36.429 > Here's the abbreviated 'time' output for the last row: > > 26.95user 1.67system 0:28.62elapsed 100%CPU > 30.78user 1.98system 0:19.49elapsed 168%CPU > 37.41user 2.04system 0:15.17elapsed 260%CPU > 60.68user 2.79system 0:15.18elapsed 417%CPU I forgot to use your time format in the test above, this is from a seperate test run. 160.22user 1.34system 2:41.61elapsed 99%CPU 159.83user 1.45system 1:27.12elapsed 185%CPU 159.84user 1.56system 0:52.26elapsed 308%CPU 160.67user 1.53system 0:36.26elapsed 447%CPU This seems to be what I would expect from a good implementation. > It suggests to me that too much time is spent busy-waiting in pthread_join, > or that sort is computing too much (I haven't looked at the patch in detail). > > Also, I'd have expected the rate going from 1 to 2 threads to get at least > a bit better with bigger file size, but it remains remarkably constant, > around 1.45 for this setup. What am I missing? _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils