2010/7/9 Paul Eggert <egg...@cs.ucla.edu>: > On 07/09/10 18:07, Pádraig Brady wrote: >> Chen Guo wrote: >>> That happened when more than one instance of memcoll is called on the same >>> line at once, since memcoll replaces the eolchar with '\0'. Under our >>> approach, >>> the same line shouldn't ever be compared at the same time, so we're fine. > > Ah, sorry, I wasn't aware of that. > >> I'm thinking of dropping >> the whole xmemcoll0() thing altogether assuming your >> statement above is correct, that a particular line will >> not be used at the same time by multiple threads. > > Yes, that makes sense. We can revert that change from gnulib, since it > makes gnulib bigger unnecessarily. >
Actually, the '\0' saves about 5% off runtime last I checked. This is because EACH TIME sort compares two lines memcoll would replace the last byte. If we set them all to NUL anyway at the start, memcoll_nul wouldn't need to do that replacement for each compare. When we output, we'd simply put the \n back. I could be wrong though, this is going off memory from 4-5 months ago. But 5% is about what I remember, when sorting 1M lines on 8 cores.