Hi Pádraig,

I was looking at your Coreutils i18n page again [1]. Do you remember why
you listed 'tsort'? Initially I thought you might have meant something
like s/strcmp/strcoll/g, but POSIX says the following:

    The LC_COLLATE variable need not affect the actions of tsort. The
    output ordering is not lexicographic, but depends on the pairs of
    items given as input.

Then I thought you might have meant having Gnulib's readtoken handle all
whitespace characters (including multi-byte ones) instead of just SPACE,
TAB, and NEWLINE. POSIX says this regarding that:

    The application shall ensure that the input consists of pairs of
    items (non-empty strings) separated by one or more <blank> or
    <newline> characters. It is unspecified whether other white-space
    characters can also be used as separators.

So either behavior is perfectly fine. I'm not sure if it is a good idea
to change the current behavior. All other implementations that I know of
use a unibyte isblank or isspace function to split tokens.

Collin

[1] https://www.pixelbeat.org/docs/coreutils_i18n/

Reply via email to