>>>Using strcoll is inefficient anyway >> >> Don't we know it! If we can avoid it, we'd like to. > > Well, the mbstowcs+wcscoll solution I presented > should be equivalent to strcoll on any platform, > and it's much faster in my tests.
That's good to know, though I'm puzzled as to why it's true. For a single comparison, can't strcoll typically return an answer without examining all the input, and wouldn't that be faster than mbstowc+wcscoll? But if it is true, perhaps we should rewrite memcoll to use the mbstowc+wcscoll combination as well. >>>but it probably is possible in ICU? >> >> Sorry, don't know. > > I wonder could we add this as a dependency? You mean, ship ICU code? Or depend on it already being installed? Sorry, I'm not familiar with the ICU code. Is it free software and is it well maintained? Where else is it being used, outside ICU itself? > Also I don't agree with splitting entities into > valid multibyte ranges and "C" for the rest. > That is probably not what the user wants the data interpreted as, > and I think (at least for uniq which I've thought about), > that it's just best to treat the whole entity as "C" > if there are invalid multibyte sequences in the entity. We can't adopt this approach in general, since it would mean that our comparison operation could return inconsistent answers. Suppose "Y" has an invalid byte sequence but "X" and "Z" are valid. Then we might have "X" < "Y" < "Z" (using C-locale comparison), but "Z" < "X" (using some other locale's comparison). This will lead to inconsistencies, which will be hard to document and will confuse users. Worse, it can even lead to buffer overruns: e.g., qsort has undefined behavior if you pass it a comparison function that is not a total order. In contrast, the approach that I suggested, though perhaps slower, is consistent: it always results in a total order. Perhaps it could be improved on, though -- I haven't thought about performance as much as you have. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils