On 4/19/26 14:31, Paul Eggert wrote:
On 2026-04-19 11:21, [email protected] wrote:
It may not be exactly the same as this grep report, but it's related.
Yes, that sounds plausible. However, my guess is that the problem occurs
only in older macOS versions, which use now-obsolete case-conversion
tables.
Maybe we can fix Gnulib regex to work even on older macOS (as well as on
OpenBSD), i.e., to treat dž as matching Dž when ignoring case even though
macOS itself does not do so. Come to think of it, this might improve
Gnulib regex performance on GNU/Linux. I'll add that to my long list of
things to do.
Just for the record and with a tad more verbosity and locale options :
/*
See GNU GREP bug report :
https://lists.gnu.org/archive/html/bug-grep/2026-04/msg00027.html
This appears to be an incompatibility in OpenBSD, which mishandles
titlecase characters. In an en_US.UTF-8 locale OpenBSD's towupper
function treats the character "Dž" (U+01C5 LATIN CAPITAL LETTER D WITH
SMALL LETTER Z WITH CARON) differently from GNU/Linux. If you call
towupper (0x01C5) on OpenBSD it returns 0x01C5, that is, it acts as if
this character is uppercase. However, it's titlecase, not uppercase. It
should uppercase to "DŽ", i.e., to U+01C4 LATIN CAPITAL LETTER DZ
WITH CARON.
Small test code should output "towupper (0x01C5) = 0x01C4" and on
OpenBSD it may output "towupper (0x01C5) = 0x01C".
*/
/*
* This code should be C90 clean and therefore we may use :
*
* #define _XOPEN_SOURCE 500
*
* NOTE: for reasons yet unknown OpenBSD 7.8 has a fit if you try
* to define _XOPEN_SOURCE and be damned if I know why.
*/
#if ! defined(__OpenBSD__)
#if ! defined (_XOPEN_SOURCE)
#define _XOPEN_SOURCE 500
#endif
#endif
#include <locale.h>
#include <wctype.h>
#include <stdio.h>
#include <stdlib.h>
int
main( int argc, char **argv )
{
/* NOTE : wchar.h - wide-character types */
wint_t c, w;
char *buf;
/* assume a trivial POSIX locale */
buf = setlocale( LC_ALL, "POSIX" );
if ( buf == NULL ) {
fprintf (stderr,"FAIL : setlocale fail\n");
return EXIT_FAILURE;
}
if ( argc > 1 ) {
printf("\nINFO : You suggest a locale of %s\n", argv[1]);
buf = setlocale( LC_ALL, argv[1] );
/* The return value is NULL if the request can not be done */
if ( buf == NULL ) {
fprintf(stderr,"FAIL : * * * locale request failed * * *\n");
fprintf(stderr," : ---------------------------------\n");
fprintf(stderr," : please check your available list\n");
fprintf(stderr," : of supported locales:\n");
fprintf(stderr," : use \"locale -a\".\n");
return EXIT_FAILURE;
}
printf(" : accepted.\n");
} else {
printf("\nINFO : locale is set to default \"POSIX\".\n\n");
}
c = 0x01C5;
w = towupper(c);
printf("\n(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
CARON)\n\n");
printf("towupper (0x01C5) = 0x%04X\n", (int) w);
printf("\n\nShould output \"towupper (0x01C5) = 0x01C4\".\n");
printf("Buggy stuff may output \"towupper (0x01C5) = 0x01C\"\n");
printf("OpenBSD 7.8 is even more strange and reports 0x01C5\n\n");
return EXIT_SUCCESS;
}
Are we sure about this test code ?
Everywhere I look the test fails on OpenBSD and FreeBSD and even Solaris.
(1) OpenBSD 7.8 AMD64 : ( -std=iso9899:1990 -pedantic -pedantic-errors )
eris$ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
eris$
eris$ ./test_wchar de_DE.UTF-8
INFO : You suggest a locale of de_DE.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
eris$
It really does not matter what locale I try. Always the same.
2) FreeBSD 15.0 with all the latest patches also shows strange output :
hydra$ uname -a
FreeBSD hydra 15.0-RELEASE-p5 FreeBSD 15.0-RELEASE-p5 GENERIC amd64
hydra$
hydra$ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
hydra$
hydra$ ./test_wchar en_US.UTF-8
INFO : You suggest a locale of en_US.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C4
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
hydra$
3) FreeBSD 16.0-CURRENT built from sources
callisto$ uname -a
FreeBSD callisto 16.0-CURRENT FreeBSD 16.0-CURRENT
main-n285053-3e27114a7f96 GENERIC amd64
callisto$ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
callisto$
callisto$ ./test_wchar en_US.UTF-8
INFO : You suggest a locale of en_US.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C4
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
callisto$
* * * T H I S I S I N T E R E S T I N G * * *
4) Solaris 11.4 latest edition on ORACLE SPARC S7-2 with the Oracle
Studio compiler tools works with locale en_US.UTF-8
neptune $ uname -a
SunOS neptune 5.11 11.4.90.212.0 sun4v sparc sun4v non-virtualized
neptune $ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
neptune $ ./test_wchar en_US.UTF-8
INFO : You suggest a locale of en_US.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C4
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
neptune $
So that works on Solaris 11.4 with the locale en_US.UTF-8 and also :
neptune $
neptune $ ./test_wchar de_DE.UTF-8
INFO : You suggest a locale of de_DE.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C4
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
neptune $
5) Solaris 10 reasonably patched and with ORACLE Studio 12.6
$
$ uname -a
SunOS hubble 5.10 Generic_150400-67 sun4v sparc sun4v
$ $CC -V
cc: Studio 12.6 Sun C 5.15 SunOS_sparc 2017/05/30
$
$ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
$ ./test_wchar en_US.UTF-8
INFO : You suggest a locale of en_US.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
$ ./test_wchar ja_JP.UTF-8
INFO : You suggest a locale of ja_JP.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
$
6) Red Hat Enterprise Linux 10 on SiFive RISC-V P550
rhel10_rv5$ uname -a
Linux sedna.bw.genunix.com 6.12.0-89.rv.0.el10.riscv64 #1 SMP
PREEMPT_DYNAMIC Wed May 28 20:21:05 UTC 2025 riscv64 GNU/Linux
rhel10_rv5$
rhel10_rv5$ cat /etc/redhat-release
Red Hat Enterprise Linux release 10.0 (Coughlan)
rhel10_rv5$
rhel10_rv5$ $CC --version
gcc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rhel10_rv5$ echo $CFLAGS
-std=iso9899:1990 -pedantic -pedantic-errors -g -O0 -fno-builtin
-fno-unsafe-math-optimizations -march=rv64imafdc -mabi=lp64d
rhel10_rv5$ echo $CPPFLAGS
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_XOPEN_SOURCE=500
rhel10_rv5$
rhel10_rv5$ $CC $CFLAGS $CPPFLAGS -o test_wchar test_wchar.c
rhel10_rv5$
rhel10_rv5$ ./test_wchar
INFO : locale is set to default "POSIX".
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C5
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
rhel10_rv5$
rhel10_rv5$ ./test_wchar en_US.UTF-8
INFO : You suggest a locale of en_US.UTF-8
: accepted.
(U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)
towupper (0x01C5) = 0x01C4
Should output "towupper (0x01C5) = 0x01C4".
Buggy stuff may output "towupper (0x01C5) = 0x01C"
OpenBSD 7.8 is even more strange and reports 0x01C5
rhel10_rv5$
So I am just looking at various systems and libC or GNU libC and yet
not MUSL yet. I may ponder that. Getting results all over the place.
Not sure if any of this helps.
--
--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken