Hi Pádraig,
I was looking at your Coreutils i18n page again [1]. Do you remember why
you listed 'tsort'? Initially I thought you might have meant something
like s/strcmp/strcoll/g, but POSIX says the following:
The LC_COLLATE variable need not affect the actions of tsort. The
output ordering is not lexicographic, but depends on the pairs of
items given as input.
Then I thought you might have meant having Gnulib's readtoken handle all
whitespace characters (including multi-byte ones) instead of just SPACE,
TAB, and NEWLINE. POSIX says this regarding that:
The application shall ensure that the input consists of pairs of
items (non-empty strings) separated by one or more <blank> or
<newline> characters. It is unspecified whether other white-space
characters can also be used as separators.
So either behavior is perfectly fine. I'm not sure if it is a good idea
to change the current behavior. All other implementations that I know of
use a unibyte isblank or isspace function to split tokens.
Collin
[1] https://www.pixelbeat.org/docs/coreutils_i18n/