On 10/02/2026 05:17, Collin Funk wrote:
Hi Pádraig,I was looking at your Coreutils i18n page again [1]. Do you remember why you listed 'tsort'? Initially I thought you might have meant something like s/strcmp/strcoll/g, but POSIX says the following: The LC_COLLATE variable need not affect the actions of tsort. The output ordering is not lexicographic, but depends on the pairs of items given as input. Then I thought you might have meant having Gnulib's readtoken handle all whitespace characters (including multi-byte ones) instead of just SPACE, TAB, and NEWLINE. POSIX says this regarding that: The application shall ensure that the input consists of pairs of items (non-empty strings) separated by one or more <blank> or <newline> characters. It is unspecified whether other white-space characters can also be used as separators. So either behavior is perfectly fine. I'm not sure if it is a good idea to change the current behavior. All other implementations that I know of use a unibyte isblank or isspace function to split tokens. Collin [1] https://www.pixelbeat.org/docs/coreutils_i18n/
Yes it was a not carefully considered usage of strcoll. I've removed the reference from that page. cheers, Padraig
