On 03/04/2026 04:48, Paul Eggert wrote:
On 2026-04-02 14:38, Pádraig Brady wrote:
$ git clone https://github.com/pixelb/coreutils.git
Some comments. In that, I see the following code:
int read_ret = read (fileno (mbbuf->fp), mbbuf->buffer + start,
mbbuf->size - start);
The type should be ssize_t, not int. As things stand this is no problem
since the buffer size is at most 256 KiB, but it's easy to insulate the
code against future changes entailing buffers larger than INT_MAX.
+ unsigned char delim_0 = to_uchar (delim_bytes[0]);
> ...
> + if (putchar (to_uchar (buf[i])) < 0)
> ...
+ if (n_bytes != 0 && to_uchar (buf[0]) == c)
+ return buf;
+ if (1 < n_bytes && to_uchar (buf[1]) == c)
+ return buf + 1;
> ...
+ while (processed < n_avail
+ && c_isblank (to_uchar (chunk[processed])))
I absorbed the above suggestions into their originating commits.
Well I kept search_bytes() general, even though we only use
it for line_delim currently.
I did adjust that function to be const correct though.
No need to call to_uchar in any of these places.
+ if (optarg[0] == '\0')
+ {
+ delim = '\0';
+ delim_bytes[0] = '\0';
+ delim_length = 1;
+ }
+ else if (MB_CUR_MAX <= 1)
+ {
+ if (optarg[1] != '\0')
+ FATAL_ERROR (_("the delimiter must be a single
character"));
+ delim = optarg[0];
+ delim_bytes[0] = optarg[0];
+ delim_length = 1;
+ }
+ else
+ {
+ mcel_t g = mcel_scanz (optarg);
+ if (optarg[g.len] != '\0')
+ FATAL_ERROR (_("the delimiter must be a single
character"));
+ copy_bytes (delim_bytes, optarg, g.len);
+ delim_length = g.len;
+ delim_mcel = g;
+ if (g.len == 1)
+ delim = optarg[0];
+ }
No need for the three major cases; one case should suffice. That is,
replace the above code with something like this:
mcel_t g = delim_mcel = mcel_scanz (optarg);
if (optarg[0] && optarg[g.len])
FATAL_ERROR (_("the delimiter must be a single character"));
copy_bytes (delim_bytes, optarg, g.len);
and then remove delim_length and delim, and replace all uses of
delim_length with delim_mcel.len, and replace all uses of delim with
delim_bytes[0] (possibly inside a to_uchar call).
Oh good call. That was all very messy.
I committed the above in your name.
https://github.com/coreutils/coreutils/commit/9a7d40677
I've forced push the updated cut-mb branch at
https://github.com/pixelb/coreutils.git
thanks for the review!
Padraig