On 27/10/2025 20:20, Rob Landley wrote:
On 10/27/25 00:03, Collin Funk wrote:
Hi Pádraig,
Pádraig Brady <[email protected]> writes:
Right. But that got me thinking that we could optimize
in various cases, rather than resorting to mbsstr().
The attached implements mbsmbchr(mbs, mbc) to more efficiently
search for a multi-byte char in a multi-byte string,
especially with the usual UTF-8 charset
(which is determined with a single call to mbrtoc32() call per process).
I wonder if that function is worth putting in gl/ under LGPL in case
we
want to use it in other programs and/or move it to Gnulib. It seems
useful to me.
Yes probably.
I was going to look at maybe using it in cut(1) too,
in which case it would definitely be appropriate to move to gl/
I was thinking about some i18n stuff today. A prerequisite to cut(1) is
getndelim2, which is probably the part that requires the most work.
I'm still waiting on a decision on "cut -DF", which among other things
added regex delimiter support:
https://lists.gnu.org/archive/html/coreutils/2022-01/msg00004.html
https://lists.gnu.org/archive/html/coreutils/2023-08/msg00050.html
https://lists.gnu.org/archive/html/coreutils/2024-06/msg00017.html
We've been reluctant to merge this for a few reasons:
1. IMHO the new interface options are confusing
2, The functionality overlaps with that in awk
(I see awk is a pending utility in toybox currently)
3. The functionality overlaps with macOS/FreeBSD/uutils -w (skip blanks),
and we probably will implement -w for better compat with those.
thanks,
Padraig