On 08/03/10 10:39, Chen Guo wrote: > Hi Padraig, > >> You previously mentioned a thread bug with memcoll. Is that worked around? > > That happened when more than one instance of memcoll is called on the same > line at once, since memcoll replaces the eolchar with '\0'. Under our > approach, > the same line shouldn't ever be compared at the same time, so we're fine. > On top of that, Professor Eggert suggested NUL delimiting all lines as they're > read in, so memcoll doesn't have to; hence the patch to gnulib, which > introduces > xmemcoll_nul and memcoll_nul, for when input is known to be NUL delimited, > thus > no replacement of the eolchar is needed, making memcoll threadsafe.
Note the current xmemcoll0() in gnulib requires the length _including_ the terminating NUL to be passed, whereas one usually does not include the terminating char in the length passed to xmemcoll(). I accordingly updated the lengths passed to xmemcoll0() by your latest patch. However there are still writes done to the source text in the keycompare() function. So I'm thinking of dropping the whole xmemcoll0() thing altogether assuming your statement above is correct, that a particular line will not be used at the same time by multiple threads. I did try to copy the text to the stack before comparing, but that introduced a significant overhead noted below. Your patch is still performing well on a single core machine: ----------- before --------------------- $ time ./src/sort < nums.list >/dev/null real 0m8.644s user 0m8.307s sys 0m0.292s $ time ./src/sort -g < nums.list >/dev/null real 0m11.046s user 0m10.652s sys 0m0.295s $ time ./src/sort -n < nums.list >/dev/null real 0m4.909s user 0m4.567s sys 0m0.298s $ time LANG=C ./src/sort < nums.list >/dev/null real 0m1.959s user 0m1.657s sys 0m0.285s ------------ after --------------------- $ time ./src/sort < nums.list >/dev/null real 0m8.686s user 0m8.300s sys 0m0.232s $ time ./src/sort -g < nums.list >/dev/null real 0m10.196s user 0m9.850s sys 0m0.221s $ time ./src/sort -n < nums.list >/dev/null real 0m2.958s user 0m2.664s sys 0m0.221s $ time LANG=C ./src/sort < nums.list >/dev/null real 0m1.985s user 0m1.750s sys 0m0.217s After copying the text to the stack as mentioned above there is a significant performance drop: $ time ./src/sort -n < nums.list >/dev/null real 0m4.086s user 0m3.848s sys 0m0.218s cheers, Pádraig.