Re: [PATCH] sort: add --threads option to parallelize internal sort.

Chen Guo Mon, 08 Mar 2010 02:40:33 -0800

Hi Padraig,

> > T=1: 5.10s


> > T=2: 2.87s
> > T=3: 2.71s
> > T=4: 1.75s
> > T=5: 1.66s
> > T=6: 1.65s
> > T=7: 1.67s
> > T=8: 1.31s
> 
> Nice results!
> 
> A few quick questions:
> 
> Any thoughts on the interesting jump at T=8?
Say we're sorting 32 lines with 8 threads, each thread would get 4 lines to 
sort. If we sort with 7 threads, then 6 threads would get 4 lines, and the last 
thread would get 8 to sort. Thus, this last thread becomes kind of a 
bottleneck. 

A way around this would be, if sorting with 7 threads, have 6 threads sort 5 
lines and the last thread sort 2. A more "wow" example might be 1000 lines with 
3 threads... We could have 250, 250, and 500, with 500 being the bottleneck, or 
333, 333, and 334.

To divide threads up this way, we'd need to at the very start do nlines / 
nthreads for all the threads except 1, and nlines - (nthreads - 1) * (nlines / 
nthreads) for the last thread. However, this method implies creating all the 
threads in a loop, which isn't as elegant as recursion. I've used this approach 
for a previous patch, but for some reason never thought of it here. I'll try it 
out and see how much the results differ.

> Have you tested in conjunction with the external || patch?
I actually havent, though I'm really interested in knowing how the speedups 
will multiply. Joey and I talked about, if sorting on N disks with balanced 
work load, calling sortlines with NTHREADS / N threads. 

> You previously mentioned a thread bug with memcoll. Is that worked around?
That happened when more than one instance of memcoll is called on the same line 
at once, since memcoll replaces the eolchar with '\0'. Under our approach, the 
same line shouldn't ever be compared at the same time, so we're fine. On top of 
that, Professor Eggert suggested NUL delimiting all lines as they're read in, 
so memcoll doesn't have to; hence the patch to gnulib, which introduces 
xmemcoll_nul and memcoll_nul, for when input is known to be NUL delimited, thus 
no replacement of the eolchar is needed, making memcoll threadsafe.

Re: [PATCH] sort: add --threads option to parallelize internal sort.

Reply via email to