Pádraig Brady <[email protected]> writes:

> On 04/02/2026 14:27, Vincent Lefevre wrote:
>> On 2026-02-04 15:17:07 +0100, Vincent Lefevre wrote:
>>> On 2010-07-23 00:24:42 +0100, Pádraig Brady wrote:
>>>> On 22/07/10 19:49, Mihai Moldovan wrote:
>>>>> (Is this even considerable as a bug, or just a "feature" in that only
>>>>> one byte delimiters are allowed by default?)
>> Oops, I was confused by the bug title and was thinking of -c. Yes,
>> possibly a missing feature for -d (because one gets an error),
>> though for POSIX, there is no such one-byte restriction.
>> But for -c, this is a real bug (no failures, incorrect output):
>> 
>>> The -c option is documented as "select only these characters" and even
>>> specified by POSIX. "One byte delimiters" would be the -b option.
>>>
>>> So, this is a real bug, not a missing feature. Anyway, a missing feature
>>> should just result in a failure (non-zero exit status, with an error
>>> message), while here, "cut" succeeds with incorrect output, which is
>>> really bad. So the severity should be set back to "normal" (at least).
>> Testcase:
>> $ echo ae1234 | cut -c 4-
>> 234
>> $ echo aé1234 | cut -c 4-
>> 1234
>> $ echo $?
>> 0
>> Note: this is "é" as U+00E9 LATIN SMALL LETTER E WITH ACUTE, i.e.
>> a single character, not the variant with a combining acute accent.
>
> paste(1) was just made multi-byte aware.
> I'm going to work on cut(1) for the next release.

I had a look at converting 'cut' to use read/write instead of getc. That
would allow you to call mbrtowc on the input.

However, it gets a bit tricky writing something like this that way:

      /* With -d$'\n' don't treat the last '\n' as a delimiter.  */
      if (delim == line_delim && c == delim)
        {
          int last_c = getc (stream);
          if (last_c != EOF)
            ungetc (last_c, stream);
          else
            c = last_c;
        }

Since you would have to read again to check if you are at the end of
file. Obviously, you want to avoid reading a single character.

Collin



Reply via email to