Remember the discussion about error handling while parsing/scanning
multibyte strings, that we had in July 2023? Paul coined the terms
"MEE" and "SEE".
<https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00145.html>
Now I got interested in
- whether the mb*iter* modules actually implement MEE,
- what's the behavioural difference between MEE and SEE, function by
function.
As a first step to understanding this, I'm enhancing the unit tests
to cover incomplete characters, both at the end of the string and
inside a string.
2026-05-25 Bruno Haible <[email protected]>
trim tests: Enhance tests.
* tests/test-trim.c (main): Add test cases with incomplete characters.
2026-05-25 Bruno Haible <[email protected]>
mbmemcasecmp tests: Enhance tests.
* tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
characters.
2026-05-25 Bruno Haible <[email protected]>
mbspcasecmp tests: Enhance tests.
* tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
* tests/test-mbspcasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbspcasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.
2026-05-25 Bruno Haible <[email protected]>
mbsncasecmp tests: Enhance tests.
* tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
* tests/test-mbsncasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbsncasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.
2026-05-25 Bruno Haible <[email protected]>
mbscasecmp tests: Enhance tests.
* tests/test-mbscasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
* tests/test-mbscasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbscasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbscasecmp-3.sh,
test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.
2026-05-25 Bruno Haible <[email protected]>
mbs_endswith tests: Enhance tests.
* tests/test-mbs_endswith2.c (main): Add more test cases. Add more
comments.
* tests/test-mbs_endswith1.c: Update comments.
* tests/test-mbs_endswith3.c: Likewise.
2026-05-25 Bruno Haible <[email protected]>
mbs_startswith tests: Enhance tests.
* tests/test-mbs_startswith2.c (OR): New macro, copied from
tests/test-mbsnlen.c.
(main): Add more test cases. Add more comments.
* tests/test-mbs_startswith1.c: Update comments.
* tests/test-mbs_startswith3.c: Likewise.
>From 900e90c433d17a467a26522b051e3f527102b289 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 17:55:56 +0200
Subject: [PATCH 1/7] mbs_startswith tests: Enhance tests.
* tests/test-mbs_startswith2.c (OR): New macro, copied from
tests/test-mbsnlen.c.
(main): Add more test cases. Add more comments.
* tests/test-mbs_startswith1.c: Update comments.
* tests/test-mbs_startswith3.c: Likewise.
---
ChangeLog | 9 ++++
tests/test-mbs_startswith1.c | 4 +-
tests/test-mbs_startswith2.c | 79 +++++++++++++++++++++++++++++++++++-
tests/test-mbs_startswith3.c | 2 +-
4 files changed, 90 insertions(+), 4 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 556c9bf5b5..3c72ed6dc8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbs_startswith tests: Enhance tests.
+ * tests/test-mbs_startswith2.c (OR): New macro, copied from
+ tests/test-mbsnlen.c.
+ (main): Add more test cases. Add more comments.
+ * tests/test-mbs_startswith1.c: Update comments.
+ * tests/test-mbs_startswith3.c: Likewise.
+
2026-05-24 Paul Eggert <[email protected]>
regex: pacify 16.1.1 -Wanalyzer-out-of-bounds
diff --git a/tests/test-mbs_startswith1.c b/tests/test-mbs_startswith1.c
index a1b89fa4a5..4c2e8b409d 100644
--- a/tests/test-mbs_startswith1.c
+++ b/tests/test-mbs_startswith1.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in the "C" locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
@@ -27,7 +27,7 @@
int
main ()
{
- /* This test is executed in the C locale. */
+ /* This test is executed in the "C" locale. */
ASSERT (mbs_startswith ("", ""));
ASSERT (mbs_startswith ("abc", ""));
diff --git a/tests/test-mbs_startswith2.c b/tests/test-mbs_startswith2.c
index 38c53dd12b..0ab6a9eeaf 100644
--- a/tests/test-mbs_startswith2.c
+++ b/tests/test-mbs_startswith2.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in a UTF-8 locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
@@ -25,6 +25,20 @@
#include "macros.h"
+/* The mcel-based implementation of mbsnlen behaves differently than the
+ original one. Namely, for invalid/incomplete byte sequences:
+ Where we ideally should have multi-byte-per-encoding-error (MEE) behaviour
+ everywhere, mcel implements single-byte-per-encoding-error (SEE) behaviour.
+ See <https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00131.html>,
+ <https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00145.html>.
+ Therefore, here we have different expected results, depending on the
+ implementation. */
+#if GNULIB_MCEL_PREFER
+# define OR(a,b) b
+#else
+# define OR(a,b) a
+#endif
+
int
main ()
{
@@ -70,26 +84,89 @@ main ()
/* Test cases with invalid or incomplete characters. */
/* A valid character should not match an invalid character. */
+ /* "\301\247" = 0xC1 0xA7 is invalid.
+ In fact, "\301" = 0xC1 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+ */
ASSERT (!mbs_startswith ("\303\247", "\301\247"));
ASSERT (!mbs_startswith ("\301\247", "\303\247"));
/* A valid character should not match an incomplete character. */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_startswith ("\303\247", "\343\247"));
ASSERT (!mbs_startswith ("\343\247", "\303\247"));
+ ASSERT (!mbs_startswith ("\343\247\214", "\343\247"));
+ ASSERT (!mbs_startswith ("\343\247\214", "\343"));
/* An invalid character should not match an incomplete character. */
+ /* "\301\247" = 0xC1 0xA7 is invalid.
+ In fact, "\301" = 0xC1 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+ */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_startswith ("\301\247", "\343\247"));
ASSERT (!mbs_startswith ("\343\247", "\301\247"));
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
+ ASSERT (!mbs_startswith ("\341\200\240", "\341\200"));
+ ASSERT (!mbs_startswith ("\341\200\240", "\341"));
+ ASSERT (mbs_startswith ("\341\200", "\341") == OR(false,true));
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ ASSERT (!mbs_startswith ("\360\221\222\240", "\360\221\222"));
+ ASSERT (!mbs_startswith ("\360\221\222\240", "\360\221"));
+ ASSERT (!mbs_startswith ("\360\221\222\240", "\360"));
+ ASSERT (mbs_startswith ("\360\221\222", "\360\221") == OR(false,true));
+ ASSERT (mbs_startswith ("\360\221\222", "\360") == OR(false,true));
+ ASSERT (mbs_startswith ("\360\221", "\360") == OR(false,true));
+
+ /* "\355\240\200" = 0xED 0xA0 0x80 = U+D800 is invalid.
+ In fact, "\355\240" = 0xED 0xA0 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7
+ and page 128 table 3-9. */
+#if 0
+ /* mbs_startswith ("\355\240\200", "\355\240") returns
+ - true on musl libc, macOS, Solaris 11.4, Cygwin, mingw, MSVC
+ and with GNULIB_MCEL_PREFER on newer glibc, FreeBSD, NetBSD, OpenBSD,
+ - false on older glibc (CentOS 5), Solaris 11 OpenIndiana/OmniOS,
+ and with !GNULIB_MCEL_PREFER on newer glibc, FreeBSD, NetBSD, OpenBSD. */
+ ASSERT (!mbs_startswith ("\355\240\200", "\355\240"));
+#endif
+#if 0
+ /* mbs_startswith ("\355\240\200", "\355") returns
+ - true on newer glibc, musl libc, macOS, FreeBSD, NetBSD, OpenBSD,
+ Solaris 11.4, Cygwin, mingw, MSVC,
+ - false on older glibc (CentOS 5), Solaris 11 OpenIndiana/OmniOS. */
+ ASSERT (!mbs_startswith ("\355\240\200", "\355"));
+#endif
+#if GNULIB_MCEL_PREFER
+ /* Single-byte encoding error (SEE) */
+ ASSERT (mbs_startswith ("\355\240", "\355"));
+#elif 0
+ /* Multi-byte encoding error (MEE) */
+ /* mbs_startswith ("\355\240", "\355") returns
+ - true on musl libc, macOS, Solaris 11.4, Cygwin, mingw, MSVC,
+ - false on glibc, FreeBSD, NetBSD, OpenBSD, Solaris 11 OpenIndiana/OmniOS.
+ */
+ ASSERT (!mbs_startswith ("\355\240", "\355"));
+#endif
+
/* Two invalid characters should match only if they are identical. */
+ /* "\301\246" = 0xC1 0xA6 is invalid. */
+ /* "\301\247" = 0xC1 0xA7 is invalid. */
ASSERT (!mbs_startswith ("\301\246", "\301\247"));
ASSERT (!mbs_startswith ("\301\247", "\301\246"));
ASSERT (mbs_startswith ("\301\247", "\301\247"));
/* Two incomplete characters should match only if they are identical. */
+ /* "\343\246" = 0xE3 0xA6 is incomplete, "\343\246\214" = U+398C is valid. */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_startswith ("\343\246", "\343\247"));
ASSERT (!mbs_startswith ("\343\247", "\343\246"));
ASSERT (mbs_startswith ("\343\247", "\343\247"));
+ ASSERT (mbs_startswith ("\343\247", "\343") == OR(false,true));
return test_exit_status;
}
diff --git a/tests/test-mbs_startswith3.c b/tests/test-mbs_startswith3.c
index 1965070401..11d87562c7 100644
--- a/tests/test-mbs_startswith3.c
+++ b/tests/test-mbs_startswith3.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in a GB18030 locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
--
2.54.0
>From 26621d07249663c0cfba331ee5295efd59bef0f7 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 17:56:44 +0200
Subject: [PATCH 2/7] mbs_endswith tests: Enhance tests.
* tests/test-mbs_endswith2.c (main): Add more test cases. Add more
comments.
* tests/test-mbs_endswith1.c: Update comments.
* tests/test-mbs_endswith3.c: Likewise.
---
ChangeLog | 8 ++++++++
tests/test-mbs_endswith1.c | 2 +-
tests/test-mbs_endswith2.c | 30 +++++++++++++++++++++++++++++-
tests/test-mbs_endswith3.c | 2 +-
4 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 3c72ed6dc8..02ad380e55 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbs_endswith tests: Enhance tests.
+ * tests/test-mbs_endswith2.c (main): Add more test cases. Add more
+ comments.
+ * tests/test-mbs_endswith1.c: Update comments.
+ * tests/test-mbs_endswith3.c: Likewise.
+
2026-05-25 Bruno Haible <[email protected]>
mbs_startswith tests: Enhance tests.
diff --git a/tests/test-mbs_endswith1.c b/tests/test-mbs_endswith1.c
index 63722b0137..7742efbc42 100644
--- a/tests/test-mbs_endswith1.c
+++ b/tests/test-mbs_endswith1.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in the "C" locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
diff --git a/tests/test-mbs_endswith2.c b/tests/test-mbs_endswith2.c
index 01c12f47ab..17ccc1f6e0 100644
--- a/tests/test-mbs_endswith2.c
+++ b/tests/test-mbs_endswith2.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in a UTF-8 locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
@@ -65,23 +65,51 @@ main ()
/* Test cases with invalid or incomplete characters. */
/* A valid character should not match an invalid character. */
+ /* "\301\247" = 0xC1 0xA7 is invalid.
+ In fact, "\301" = 0xC1 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+ */
ASSERT (!mbs_endswith ("\303\247", "\301\247"));
ASSERT (!mbs_endswith ("\301\247", "\303\247"));
/* A valid character should not match an incomplete character. */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_endswith ("\303\247", "\343\247"));
ASSERT (!mbs_endswith ("\343\247", "\303\247"));
/* An invalid character should not match an incomplete character. */
+ /* "\301\247" = 0xC1 0xA7 is invalid.
+ In fact, "\301" = 0xC1 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+ */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_endswith ("\301\247", "\343\247"));
ASSERT (!mbs_endswith ("\343\247", "\301\247"));
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
+ ASSERT (!mbs_endswith ("\341\200\240", "\200\240"));
+ ASSERT (!mbs_endswith ("\341\200\240", "\240"));
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ ASSERT (!mbs_endswith ("\360\221\222\240", "\221\222\240"));
+ ASSERT (!mbs_endswith ("\360\221\222\240", "\222\240"));
+ ASSERT (!mbs_endswith ("\360\221\222\240", "\240"));
+
/* Two invalid characters should match only if they are identical. */
+ /* "\301\246" = 0xC1 0xA6 is invalid.
+ "\301\247" = 0xC1 0xA7 is invalid.
+ In fact, "\301" = 0xC1 is already invalid, see
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+ */
ASSERT (!mbs_endswith ("\301\246", "\301\247"));
ASSERT (!mbs_endswith ("\301\247", "\301\246"));
ASSERT (mbs_endswith ("\301\247", "\301\247"));
/* Two incomplete characters should match only if they are identical. */
+ /* "\343\246" = 0xE3 0xA6 is incomplete, "\343\246\214" = U+398C is valid. */
+ /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid. */
ASSERT (!mbs_endswith ("\343\246", "\343\247"));
ASSERT (!mbs_endswith ("\343\247", "\343\246"));
ASSERT (mbs_endswith ("\343\247", "\343\247"));
diff --git a/tests/test-mbs_endswith3.c b/tests/test-mbs_endswith3.c
index ad1e24f5e8..e1abd1195e 100644
--- a/tests/test-mbs_endswith3.c
+++ b/tests/test-mbs_endswith3.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in a GB18030 locale.
Copyright (C) 2025-2026 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
--
2.54.0
From 2e73a29a97c7fc0e7b3d5737cd84172cb82b4069 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:32:18 +0200
Subject: [PATCH 5/7] mbspcasecmp tests: Enhance tests.
* tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
* tests/test-mbspcasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbspcasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.
---
ChangeLog | 16 +++
modules/mbspcasecmp-tests | 14 ++-
tests/test-mbspcasecmp-3.sh | 23 ++++
...t-mbspcasecmp.sh => test-mbspcasecmp-4.sh} | 2 +-
tests/test-mbspcasecmp.c | 114 +++++++++++++++---
5 files changed, 145 insertions(+), 24 deletions(-)
create mode 100755 tests/test-mbspcasecmp-3.sh
rename tests/{test-mbspcasecmp.sh => test-mbspcasecmp-4.sh} (89%)
diff --git a/ChangeLog b/ChangeLog
index 14ce0d68fd..c5b5e39291 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbspcasecmp tests: Enhance tests.
+ * tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
+ main.
+ (test_utf_8): Likewise. Add test cases with incomplete characters.
+ (main): Invoke them. Accept a numeric argument.
+ * tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
+ * tests/test-mbspcasecmp-3.sh: New file, based on
+ tests/test-mbmemcasecmp-3.sh.
+ * modules/mbspcasecmp-tests (Files): Update after rename. Add
+ locale-en.m4, locale-fr.m4.
+ (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+ (Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
+ test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.
+
2026-05-25 Bruno Haible <[email protected]>
mbsncasecmp tests: Enhance tests.
diff --git a/modules/mbspcasecmp-tests b/modules/mbspcasecmp-tests
index e82a37eb23..4ca8c4d95e 100644
--- a/modules/mbspcasecmp-tests
+++ b/modules/mbspcasecmp-tests
@@ -1,7 +1,10 @@
Files:
-tests/test-mbspcasecmp.sh
+tests/test-mbspcasecmp-3.sh
+tests/test-mbspcasecmp-4.sh
tests/test-mbspcasecmp.c
tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
m4/locale-tr.m4
m4/codeset.m4
@@ -9,10 +12,15 @@ Depends-on:
setlocale
configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
gt_LOCALE_TR_UTF8
Makefile.am:
-TESTS += test-mbspcasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbspcasecmp-3.sh test-mbspcasecmp-4.sh
+TESTS_ENVIRONMENT += \
+ LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+ LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+ LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
check_PROGRAMS += test-mbspcasecmp
test_mbspcasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbspcasecmp-3.sh b/tests/test-mbspcasecmp-3.sh
new file mode 100755
index 0000000000..dc4619a0c3
--- /dev/null
+++ b/tests/test-mbspcasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+ if test -f /usr/bin/localedef; then
+ echo "Skipping test: no english or french Unicode locale is installed"
+ else
+ echo "Skipping test: no english or french Unicode locale is supported"
+ fi
+ exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+ testlocale=$LOCALE_FR_UTF8
+else
+ testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbspcasecmp${EXEEXT} 3
diff --git a/tests/test-mbspcasecmp.sh b/tests/test-mbspcasecmp-4.sh
similarity index 89%
rename from tests/test-mbspcasecmp.sh
rename to tests/test-mbspcasecmp-4.sh
index 1e390755f1..daef45b62c 100755
--- a/tests/test-mbspcasecmp.sh
+++ b/tests/test-mbspcasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
fi
LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbspcasecmp${EXEEXT}
+${CHECKER} ./test-mbspcasecmp${EXEEXT} 4
diff --git a/tests/test-mbspcasecmp.c b/tests/test-mbspcasecmp.c
index 8839a7444d..a1407164cd 100644
--- a/tests/test-mbspcasecmp.c
+++ b/tests/test-mbspcasecmp.c
@@ -24,13 +24,9 @@
#include "macros.h"
-int
-main ()
+static void
+test_ascii (void)
{
- /* configure should already have checked that the locale is supported. */
- if (setlocale (LC_ALL, "") == NULL)
- return 1;
-
{
const char string[] = "paragraph";
ASSERT (mbspcasecmp (string, "Paragraph") == string + 9);
@@ -60,31 +56,109 @@ main ()
const char string[] = "paragraph";
ASSERT (mbspcasecmp (string, "para") == string + 4);
}
+}
+static void
+test_utf_8 (bool turkish)
+{
/* The following tests shows how mbspcasecmp() is different from
strncasecmp(). */
+ if (turkish)
+ {
+ {
+ const char string[] = "\303\266zg\303\274rt\303\274k"; /* ??zg??rt??k */
+ ASSERT (mbspcasecmp (string, "\303\226ZG\303\234R") == string + 7); /* ??zg??r */
+ }
+
+ {
+ const char string[] = "\303\226ZG\303\234Rt\303\274k"; /* ??zg??rt??k */
+ ASSERT (mbspcasecmp (string, "\303\266zg\303\274r") == string + 7); /* ??zg??r */
+ }
+
+ /* This test shows how strings of different size can compare equal. */
+
+ {
+ const char string[] = "turkishtime";
+ ASSERT (mbspcasecmp (string, "TURK\304\260SH") == string + 7);
+ }
+
+ {
+ const char string[] = "TURK\304\260SHK\303\234LT\303\234R";
+ ASSERT (mbspcasecmp (string, "turkish") == string + 8);
+ }
+ }
+
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
{
- const char string[] = "\303\266zg\303\274rt\303\274k"; /* ??zg??rt??k */
- ASSERT (mbspcasecmp (string, "\303\226ZG\303\234R") == string + 7); /* ??zg??r */
+ const char string[] = "\341\200";
+ ASSERT (mbspcasecmp (string, "\341\200") == string + 2);
}
-
{
- const char string[] = "\303\226ZG\303\234Rt\303\274k"; /* ??zg??rt??k */
- ASSERT (mbspcasecmp (string, "\303\266zg\303\274r") == string + 7); /* ??zg??r */
+ const char string[] = "\341\200X";
+ ASSERT (mbspcasecmp (string, "\341\200x") == string + 3);
}
-
- /* This test shows how strings of different size can compare equal. */
-
{
- const char string[] = "turkishtime";
- ASSERT (mbspcasecmp (string, "TURK\304\260SH") == string + 7);
+ const char string[] = "\341";
+ ASSERT (mbspcasecmp (string, "\341") == string + 1);
}
-
{
- const char string[] = "TURK\304\260SHK\303\234LT\303\234R";
- ASSERT (mbspcasecmp (string, "turkish") == string + 8);
+ const char string[] = "\341X";
+ ASSERT (mbspcasecmp (string, "\341x") == string + 2);
}
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ {
+ const char string[] = "\360\221\222";
+ ASSERT (mbspcasecmp (string, "\360\221\222") == string + 3);
+ }
+ {
+ const char string[] = "\360\221\222X";
+ ASSERT (mbspcasecmp (string, "\360\221\222x") == string + 4);
+ }
+ {
+ const char string[] = "\360\221";
+ ASSERT (mbspcasecmp (string, "\360\221") == string + 2);
+ }
+ {
+ const char string[] = "\360\221X";
+ ASSERT (mbspcasecmp (string, "\360\221x") == string + 3);
+ }
+ {
+ const char string[] = "\360";
+ ASSERT (mbspcasecmp (string, "\360") == string + 1);
+ }
+ {
+ const char string[] = "\360X";
+ ASSERT (mbspcasecmp (string, "\360x") == string + 2);
+ }
+}
+
+int
+main (int argc, char *argv[])
+{
+ /* configure should already have checked that the locale is supported. */
+ if (setlocale (LC_ALL, "") == NULL)
+ return 1;
+
+ test_ascii ();
+
+ if (argc > 1)
+ switch (argv[1][0])
+ {
+ case '3':
+ /* Locale encoding is UTF-8, locale is not Turkish. */
+ test_utf_8 (false);
+ return test_exit_status;
+
+ case '4':
+ /* Locale encoding is UTF-8, locale is Turkish. */
+ test_utf_8 (true);
+ return test_exit_status;
+ }
- return test_exit_status;
+ return 1;
}
--
2.54.0
From 1d411d24777b1defd3a065300da16b31586ef85f Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:27:18 +0200
Subject: [PATCH 4/7] mbsncasecmp tests: Enhance tests.
* tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
* tests/test-mbsncasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbsncasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.
---
ChangeLog | 16 +++++
modules/mbsncasecmp-tests | 14 +++-
tests/test-mbsncasecmp-3.sh | 23 +++++++
...t-mbsncasecmp.sh => test-mbsncasecmp-4.sh} | 2 +-
tests/test-mbsncasecmp.c | 68 +++++++++++++++----
5 files changed, 107 insertions(+), 16 deletions(-)
create mode 100755 tests/test-mbsncasecmp-3.sh
rename tests/{test-mbsncasecmp.sh => test-mbsncasecmp-4.sh} (89%)
diff --git a/ChangeLog b/ChangeLog
index 5940e0d95f..14ce0d68fd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbsncasecmp tests: Enhance tests.
+ * tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
+ main.
+ (test_utf_8): Likewise. Add test cases with incomplete characters.
+ (main): Invoke them. Accept a numeric argument.
+ * tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
+ * tests/test-mbsncasecmp-3.sh: New file, based on
+ tests/test-mbmemcasecmp-3.sh.
+ * modules/mbsncasecmp-tests (Files): Update after rename. Add
+ locale-en.m4, locale-fr.m4.
+ (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+ (Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
+ test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.
+
2026-05-25 Bruno Haible <[email protected]>
mbscasecmp tests: Enhance tests.
diff --git a/modules/mbsncasecmp-tests b/modules/mbsncasecmp-tests
index 5ed84188ea..c41804a2d5 100644
--- a/modules/mbsncasecmp-tests
+++ b/modules/mbsncasecmp-tests
@@ -1,7 +1,10 @@
Files:
-tests/test-mbsncasecmp.sh
+tests/test-mbsncasecmp-3.sh
+tests/test-mbsncasecmp-4.sh
tests/test-mbsncasecmp.c
tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
m4/locale-tr.m4
m4/codeset.m4
@@ -9,10 +12,15 @@ Depends-on:
setlocale
configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
gt_LOCALE_TR_UTF8
Makefile.am:
-TESTS += test-mbsncasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbsncasecmp-3.sh test-mbsncasecmp-4.sh
+TESTS_ENVIRONMENT += \
+ LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+ LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+ LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
check_PROGRAMS += test-mbsncasecmp
test_mbsncasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbsncasecmp-3.sh b/tests/test-mbsncasecmp-3.sh
new file mode 100755
index 0000000000..f5bee7f298
--- /dev/null
+++ b/tests/test-mbsncasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+ if test -f /usr/bin/localedef; then
+ echo "Skipping test: no english or french Unicode locale is installed"
+ else
+ echo "Skipping test: no english or french Unicode locale is supported"
+ fi
+ exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+ testlocale=$LOCALE_FR_UTF8
+else
+ testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbsncasecmp${EXEEXT} 3
diff --git a/tests/test-mbsncasecmp.sh b/tests/test-mbsncasecmp-4.sh
similarity index 89%
rename from tests/test-mbsncasecmp.sh
rename to tests/test-mbsncasecmp-4.sh
index baf1e542bd..c7cf85c969 100755
--- a/tests/test-mbsncasecmp.sh
+++ b/tests/test-mbsncasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
fi
LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbsncasecmp${EXEEXT}
+${CHECKER} ./test-mbsncasecmp${EXEEXT} 4
diff --git a/tests/test-mbsncasecmp.c b/tests/test-mbsncasecmp.c
index 1858483f81..fb98f01354 100644
--- a/tests/test-mbsncasecmp.c
+++ b/tests/test-mbsncasecmp.c
@@ -24,13 +24,9 @@
#include "macros.h"
-int
-main ()
+static void
+test_ascii (void)
{
- /* configure should already have checked that the locale is supported. */
- if (setlocale (LC_ALL, "") == NULL)
- return 1;
-
ASSERT (mbsncasecmp ("paragraph", "Paragraph", 1000000) == 0);
ASSERT (mbsncasecmp ("paragraph", "Paragraph", 9) == 0);
@@ -54,16 +50,64 @@ main ()
ASSERT (mbsncasecmp ("paragraph", "para", 9) > 0);
ASSERT (mbsncasecmp ("paragraph", "para", 5) > 0);
ASSERT (mbsncasecmp ("paragraph", "para", 4) == 0);
+}
+static void
+test_utf_8 (bool turkish)
+{
/* The following tests shows how mbsncasecmp() is different from
strncasecmp(). */
- ASSERT (mbsncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) == 0); /* ??zg??r */
- ASSERT (mbsncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) == 0); /* ??zg??r */
+ if (turkish)
+ {
+ ASSERT (mbsncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) == 0); /* ??zg??r */
+ ASSERT (mbsncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) == 0); /* ??zg??r */
+
+ /* This test shows how strings of different size can compare equal. */
+ ASSERT (mbsncasecmp ("turkish", "TURK\304\260SH", 7) == 0);
+ ASSERT (mbsncasecmp ("TURK\304\260SH", "turkish", 7) == 0);
+ }
+
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
+ ASSERT (mbsncasecmp ("\341\200", "\341\200", 99) == 0);
+ ASSERT (mbsncasecmp ("\341\200X", "\341\200x", 99) == 0);
+ ASSERT (mbsncasecmp ("\341", "\341", 99) == 0);
+ ASSERT (mbsncasecmp ("\341X", "\341x", 99) == 0);
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ ASSERT (mbsncasecmp ("\360\221\222", "\360\221\222", 99) == 0);
+ ASSERT (mbsncasecmp ("\360\221\222X", "\360\221\222x", 99) == 0);
+ ASSERT (mbsncasecmp ("\360\221", "\360\221", 99) == 0);
+ ASSERT (mbsncasecmp ("\360\221X", "\360\221x", 99) == 0);
+ ASSERT (mbsncasecmp ("\360", "\360", 99) == 0);
+ ASSERT (mbsncasecmp ("\360X", "\360x", 99) == 0);
+}
+
+int
+main (int argc, char *argv[])
+{
+ /* configure should already have checked that the locale is supported. */
+ if (setlocale (LC_ALL, "") == NULL)
+ return 1;
+
+ test_ascii ();
+
+ if (argc > 1)
+ switch (argv[1][0])
+ {
+ case '3':
+ /* Locale encoding is UTF-8, locale is not Turkish. */
+ test_utf_8 (false);
+ return test_exit_status;
- /* This test shows how strings of different size can compare equal. */
- ASSERT (mbsncasecmp ("turkish", "TURK\304\260SH", 7) == 0);
- ASSERT (mbsncasecmp ("TURK\304\260SH", "turkish", 7) == 0);
+ case '4':
+ /* Locale encoding is UTF-8, locale is Turkish. */
+ test_utf_8 (true);
+ return test_exit_status;
+ }
- return test_exit_status;
+ return 1;
}
--
2.54.0
From 25b66bfe9de7c305f641bd815f13be03159bfaec Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:20:40 +0200
Subject: [PATCH 3/7] mbscasecmp tests: Enhance tests.
* tests/test-mbscasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
* tests/test-mbscasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbscasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbscasecmp-3.sh,
test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.
---
ChangeLog | 16 +++++
modules/mbscasecmp-tests | 14 +++-
tests/test-mbscasecmp-3.sh | 23 +++++++
...est-mbscasecmp.sh => test-mbscasecmp-4.sh} | 2 +-
tests/test-mbscasecmp.c | 68 +++++++++++++++----
5 files changed, 107 insertions(+), 16 deletions(-)
create mode 100755 tests/test-mbscasecmp-3.sh
rename tests/{test-mbscasecmp.sh => test-mbscasecmp-4.sh} (89%)
diff --git a/ChangeLog b/ChangeLog
index 02ad380e55..5940e0d95f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbscasecmp tests: Enhance tests.
+ * tests/test-mbscasecmp.c (test_ascii): New function, extracted from
+ main.
+ (test_utf_8): Likewise. Add test cases with incomplete characters.
+ (main): Invoke them. Accept a numeric argument.
+ * tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
+ * tests/test-mbscasecmp-3.sh: New file, based on
+ tests/test-mbmemcasecmp-3.sh.
+ * modules/mbscasecmp-tests (Files): Update after rename. Add
+ locale-en.m4, locale-fr.m4.
+ (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+ (Makefile.am): Arrange to run test-mbscasecmp-3.sh,
+ test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.
+
2026-05-25 Bruno Haible <[email protected]>
mbs_endswith tests: Enhance tests.
diff --git a/modules/mbscasecmp-tests b/modules/mbscasecmp-tests
index 61282af862..bdbb0cf17b 100644
--- a/modules/mbscasecmp-tests
+++ b/modules/mbscasecmp-tests
@@ -1,7 +1,10 @@
Files:
-tests/test-mbscasecmp.sh
+tests/test-mbscasecmp-3.sh
+tests/test-mbscasecmp-4.sh
tests/test-mbscasecmp.c
tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
m4/locale-tr.m4
m4/codeset.m4
@@ -9,10 +12,15 @@ Depends-on:
setlocale
configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
gt_LOCALE_TR_UTF8
Makefile.am:
-TESTS += test-mbscasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbscasecmp-3.sh test-mbscasecmp-4.sh
+TESTS_ENVIRONMENT += \
+ LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+ LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+ LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
check_PROGRAMS += test-mbscasecmp
test_mbscasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbscasecmp-3.sh b/tests/test-mbscasecmp-3.sh
new file mode 100755
index 0000000000..72ee7d4738
--- /dev/null
+++ b/tests/test-mbscasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+ if test -f /usr/bin/localedef; then
+ echo "Skipping test: no english or french Unicode locale is installed"
+ else
+ echo "Skipping test: no english or french Unicode locale is supported"
+ fi
+ exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+ testlocale=$LOCALE_FR_UTF8
+else
+ testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbscasecmp${EXEEXT} 3
diff --git a/tests/test-mbscasecmp.sh b/tests/test-mbscasecmp-4.sh
similarity index 89%
rename from tests/test-mbscasecmp.sh
rename to tests/test-mbscasecmp-4.sh
index 73e62b5f50..e5c5a90b17 100755
--- a/tests/test-mbscasecmp.sh
+++ b/tests/test-mbscasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
fi
LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbscasecmp${EXEEXT}
+${CHECKER} ./test-mbscasecmp${EXEEXT} 4
diff --git a/tests/test-mbscasecmp.c b/tests/test-mbscasecmp.c
index 1c12691dea..f309d3e517 100644
--- a/tests/test-mbscasecmp.c
+++ b/tests/test-mbscasecmp.c
@@ -24,13 +24,9 @@
#include "macros.h"
-int
-main ()
+static void
+test_ascii (void)
{
- /* configure should already have checked that the locale is supported. */
- if (setlocale (LC_ALL, "") == NULL)
- return 1;
-
ASSERT (mbscasecmp ("paragraph", "Paragraph") == 0);
ASSERT (mbscasecmp ("paragrapH", "parAgRaph") == 0);
@@ -40,16 +36,64 @@ main ()
ASSERT (mbscasecmp ("para", "paragraph") < 0);
ASSERT (mbscasecmp ("paragraph", "para") > 0);
+}
+static void
+test_utf_8 (bool turkish)
+{
/* The following tests shows how mbscasecmp() is different from
strcasecmp(). */
- ASSERT (mbscasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") == 0); /* ??zg??r */
- ASSERT (mbscasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") == 0); /* ??zg??r */
+ if (turkish)
+ {
+ ASSERT (mbscasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") == 0); /* ??zg??r */
+ ASSERT (mbscasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") == 0); /* ??zg??r */
+
+ /* This test shows how strings of different size can compare equal. */
+ ASSERT (mbscasecmp ("turkish", "TURK\304\260SH") == 0);
+ ASSERT (mbscasecmp ("TURK\304\260SH", "turkish") == 0);
+ }
+
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
+ ASSERT (mbscasecmp ("\341\200", "\341\200") == 0);
+ ASSERT (mbscasecmp ("\341\200X", "\341\200x") == 0);
+ ASSERT (mbscasecmp ("\341", "\341") == 0);
+ ASSERT (mbscasecmp ("\341X", "\341x") == 0);
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ ASSERT (mbscasecmp ("\360\221\222", "\360\221\222") == 0);
+ ASSERT (mbscasecmp ("\360\221\222X", "\360\221\222x") == 0);
+ ASSERT (mbscasecmp ("\360\221", "\360\221") == 0);
+ ASSERT (mbscasecmp ("\360\221X", "\360\221x") == 0);
+ ASSERT (mbscasecmp ("\360", "\360") == 0);
+ ASSERT (mbscasecmp ("\360X", "\360x") == 0);
+}
+
+int
+main (int argc, char *argv[])
+{
+ /* configure should already have checked that the locale is supported. */
+ if (setlocale (LC_ALL, "") == NULL)
+ return 1;
+
+ test_ascii ();
+
+ if (argc > 1)
+ switch (argv[1][0])
+ {
+ case '3':
+ /* Locale encoding is UTF-8, locale is not Turkish. */
+ test_utf_8 (false);
+ return test_exit_status;
- /* This test shows how strings of different size can compare equal. */
- ASSERT (mbscasecmp ("turkish", "TURK\304\260SH") == 0);
- ASSERT (mbscasecmp ("TURK\304\260SH", "turkish") == 0);
+ case '4':
+ /* Locale encoding is UTF-8, locale is Turkish. */
+ test_utf_8 (true);
+ return test_exit_status;
+ }
- return test_exit_status;
+ return 1;
}
--
2.54.0
>From 8d19402b5bd78976c08312da1e387d16c8fb8ff9 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:36:12 +0200
Subject: [PATCH 6/7] mbmemcasecmp tests: Enhance tests.
* tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
characters.
---
ChangeLog | 6 ++++++
tests/test-mbmemcasecmp.h | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index c5b5e39291..766a5860a5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ mbmemcasecmp tests: Enhance tests.
+ * tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
+ characters.
+
2026-05-25 Bruno Haible <[email protected]>
mbspcasecmp tests: Enhance tests.
diff --git a/tests/test-mbmemcasecmp.h b/tests/test-mbmemcasecmp.h
index c2175815b2..ff19c70b5c 100644
--- a/tests/test-mbmemcasecmp.h
+++ b/tests/test-mbmemcasecmp.h
@@ -395,4 +395,40 @@ test_utf_8 (int (*my_casecmp) (const char *, size_t, const char *, size_t), bool
ASSERT (my_casecmp (input, countof (input), casefolded_decomposed, countof (casefolded_decomposed)) == 0);
}
#endif
+
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+ /* 0xE1 0x80 0xA0 = U+1020. */
+ {
+ static const char input1[] = { 0xE1, 0x80, 'x', 0xE1, 0x80 };
+ static const char input2[] = { 0xE1, 0x80, 'X', 0xE1, 0x80 };
+
+ ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+ }
+ {
+ static const char input1[] = { 0xE1, 'x', 0xE1 };
+ static const char input2[] = { 0xE1, 'X', 0xE1 };
+
+ ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+ }
+ /* 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ {
+ static const char input1[] = { 0xF0, 0x91, 0x92, 'x', 0xF0, 0x91, 0x92 };
+ static const char input2[] = { 0xF0, 0x91, 0x92, 'X', 0xF0, 0x91, 0x92 };
+
+ ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+ }
+ {
+ static const char input1[] = { 0xF0, 0x91, 'x', 0xF0, 0x91 };
+ static const char input2[] = { 0xF0, 0x91, 'X', 0xF0, 0x91 };
+
+ ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+ }
+ {
+ static const char input1[] = { 0xF0, 'x', 0xF0 };
+ static const char input2[] = { 0xF0, 'X', 0xF0 };
+
+ ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+ }
}
--
2.54.0
>From 070f9259d67373ef7530e8a2523b04258920a449 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:37:25 +0200
Subject: [PATCH 7/7] trim tests: Enhance tests.
* tests/test-trim.c (main): Add test cases with incomplete characters.
---
ChangeLog | 5 +++++
tests/test-trim.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index 766a5860a5..0611c6c7a1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2026-05-25 Bruno Haible <[email protected]>
+
+ trim tests: Enhance tests.
+ * tests/test-trim.c (main): Add test cases with incomplete characters.
+
2026-05-25 Bruno Haible <[email protected]>
mbmemcasecmp tests: Enhance tests.
diff --git a/tests/test-trim.c b/tests/test-trim.c
index 745c7492dd..27a7a193c4 100644
--- a/tests/test-trim.c
+++ b/tests/test-trim.c
@@ -133,6 +133,36 @@ main (int argc, char *argv[])
ASSERT (streq (result, "\302\267foo"));
free (result);
}
+ /* Incomplete characters. See
+ https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+ page 128 table 3-11. */
+ /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020. */
+ {
+ char *result = trim ("\342\200\202\341\200\342\200\202");
+ ASSERT (streq (result, "\341\200"));
+ free (result);
+ }
+ {
+ char *result = trim ("\342\200\202\341\342\200\202");
+ ASSERT (streq (result, "\341"));
+ free (result);
+ }
+ /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0. */
+ {
+ char *result = trim ("\342\200\202\360\221\222\342\200\202");
+ ASSERT (streq (result, "\360\221\222"));
+ free (result);
+ }
+ {
+ char *result = trim ("\342\200\202\360\221\342\200\202");
+ ASSERT (streq (result, "\360\221"));
+ free (result);
+ }
+ {
+ char *result = trim ("\342\200\202\360\342\200\202");
+ ASSERT (streq (result, "\360"));
+ free (result);
+ }
return test_exit_status;
case '3':
--
2.54.0