Glen Lenker wrote: > On Sat, Mar 28, 2009 at 04:51:52PM +0100, Ralf Wildenhues wrote: >> * Glen Lenker wrote on Fri, Mar 27, 2009 at 11:07:19PM CET: >> > On Thu, Mar 26, 2009 at 09:50:08PM +0000, Ralf Wildenhues wrote: >> > > Example run, on an 8-way, and with cat'ed instances of the dictionary, >> > > on tmpfs, timings best of three: >> >> > > It suggests to me that too much time is spent busy-waiting in >> > > pthread_join, >> > > or that sort is computing too much (I haven't looked at the patch in >> > > detail). >> > > >> > > Also, I'd have expected the rate going from 1 to 2 threads to get at >> > > least >> > > a bit better with bigger file size, but it remains remarkably constant, >> > > around 1.45 for this setup. What am I missing? >> >> > What is the specifications for your computer? Does 8-way == 8 cores? >> >> Yes. Or 2 CPUs with 4 cores each or 4 CPUs with 2 cores each, I don't >> remember. Is that distinction important?
Most definitely. I haven't studied this enough but do know that NUMA (http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access) makes a big difference as the number of cores increases. So the model of CPU is critical. AMD opteron's have had NUMA support for quite a while, and thus might be expected to have better performance than an otherwise-comparable non-NUMA Intel CPU (Intel's newer nehalem and tukwila cores do have NUMA support). When communication speed matters enough, preferring to work with better-connected neighbor cores can be worthwhile. For example, you might want to run sort timings via the numactl program to favor local memory accesses and/or to have less variance due to haphazard thread->CPU placement. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
