On 10/02/2026 05:17, Collin Funk wrote:
Hi Pádraig,

I was looking at your Coreutils i18n page again [1]. Do you remember why
you listed 'tsort'? Initially I thought you might have meant something
like s/strcmp/strcoll/g, but POSIX says the following:

     The LC_COLLATE variable need not affect the actions of tsort. The
     output ordering is not lexicographic, but depends on the pairs of
     items given as input.

Then I thought you might have meant having Gnulib's readtoken handle all
whitespace characters (including multi-byte ones) instead of just SPACE,
TAB, and NEWLINE. POSIX says this regarding that:

     The application shall ensure that the input consists of pairs of
     items (non-empty strings) separated by one or more <blank> or
     <newline> characters. It is unspecified whether other white-space
     characters can also be used as separators.

So either behavior is perfectly fine. I'm not sure if it is a good idea
to change the current behavior. All other implementations that I know of
use a unibyte isblank or isspace function to split tokens.

Collin

[1] https://www.pixelbeat.org/docs/coreutils_i18n/

Yes it was a not carefully considered usage of strcoll.
I've removed the reference from that page.

cheers,
Padraig

Reply via email to