Package: libc6 Version: 2.36-5 Severity: normal Tags: upstream X-Debbugs-Cc: r...@debian.org
Hi, Thanks for taking care of glibc in Debian! While trying to write a test case for a text processing utility that is sort of aware of locales and character encodings, I stumbled upon the fact that, in an UTF-8-capable locale, fnmatch() seems to think that the `ñ` ("enye", "LATIN SMALL LETTER N WITH TILDE", U+00F1) character should match both the "?" and "??" patterns. See the attached C program and the `run-test.sh` demonstration tool; `make test` in a directory where all four files are installed should do it. If anything goes wrong with the attached files, they are also available in a GitLab repository at https://gitlab.com/ppentchev/fnmess A bullseye chroot and Docker container do not show the problem (the test passes). FTR, I was able to reproduce the problem on an AlmaLinux 9 system with glibc 2.34, so it might not be limited to 2.36. Thanks in advance for your time, and keep up the great work! G'luck, Peter -- System Information: Debian Release: bookworm/sid APT prefers testing APT policy: (990, 'testing'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'oldstable-updates'), (500, 'oldoldstable'), (500, 'stable'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 6.0.0-4-amd64 (SMP w/8 CPU threads; PREEMPT) Locale: LANG=bg_BG.UTF-8, LC_CTYPE=bg_BG.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libc6 depends on: ii libgcc-s1 12.2.0-9 Versions of packages libc6 recommends: ii libidn2-0 2.3.3-1+b1 Versions of packages libc6 suggests: ii debconf [debconf-2.0] 1.5.79 pn glibc-doc <none> ii libc-l10n 2.36-5 pn libnss-nis <none> pn libnss-nisplus <none> ii locales 2.36-5 -- debconf information: * libraries/restart-without-asking: true glibc/disable-screensaver: glibc/kernel-not-supported: glibc/kernel-too-old: glibc/restart-failed: glibc/restart-services: glibc/upgrade: true
#!/usr/bin/make -f # # Copyright (c) 2022 Peter Pentchev <r...@ringlet.net> # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. CPPFLAGS?= -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 CFLAGS_WARN?= -Wall -W -Wextra -Wno-trigraphs CFLAGS_OPT?= -g -O -pipe CFLAGS?= ${CFLAGS_WARN} ${CFLAGS_OPT} LDFLAGS?= LIBS?= all: fnmess fnmess: fnmess.o cc ${LDFLAGS} -o fnmess fnmess.o ${LIBS} fnmess.o: fnmess.c cc -c ${CPPFLAGS} ${CFLAGS} -o fnmess.o fnmess.c clean: rm -f fnmess fnmess.o test: all sh run-test.sh python3 fnmess.py sh run-test.sh ./fnmess .PHONY: clean all test
/** * Copyright (c) 2022 Peter Pentchev <r...@ringlet.net> * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include <fnmatch.h> #include <locale.h> #include <stdio.h> int main(void) { char enye[3] = {0xC3, 0xB1, 0}; puts("Hell world!"); setlocale(LC_ALL, ""); printf("Using the '%s' locale for LC_CTYPE\n", setlocale(LC_CTYPE, NULL)); printf("Does it match '?': %s\n", fnmatch("?", enye, 0) == 0 ? "yes" : "no"); printf("Does it match '??': %s\n", fnmatch("??", enye, 0) == 0 ? "yes" : "no"); printf("Does it match '???': %s\n", fnmatch("???", enye, 0) == 0 ? "yes" : "no"); return 0; }
#!/bin/sh # # Copyright (c) 2022 Peter Pentchev <r...@ringlet.net> # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. set -e check() { local tempf="$1" loc="$2" c1="$3" c2="$4" c3="$5" shift 5 printf -- '\n==== Checking the result for the %s locale\n\n' "$loc" env LC_CTYPE="$loc" "$@" > "$tempf" # Yes, there are dozens of ways to make this more generic. I know. if ! grep -Fxe "Does it match '?': $c1" -- "$tempf"; then echo 'Failed the "?" check' 1>&2 exit 1 fi if ! grep -Fxe "Does it match '??': $c2" -- "$tempf"; then echo 'Failed the "??" check' 1>&2 exit 1 fi if ! grep -Fxe "Does it match '???': $c3" -- "$tempf"; then echo 'Failed the "???" check' 1>&2 exit 1 fi } if [ "$#" -eq 0 ]; then echo 'Usage: run-test.sh command [args...]' 1>&2 echo '' 1>&2 echo 'Examples: run-test.sh ./fnmess' 1>&2 echo ' run-test.sh python3 fnmess.py' 1>&2 echo '' 1>&2 exit 1 fi if [ -z "$FNMESS_TEST_U8LOC" ]; then echo 'Looking for an UTF-8-capable locale' u8loc="$(locale -a | grep -Eie '\.utf-?8([^a-zA-Z0-9_-]|$)' | head -n1)" if [ -z "$u8loc" ]; then echo "No UTF-8-capable locale found" 1>&2 exit 1 fi else u8loc="$FNMESS_TEST_U8LOC" fi echo "Using '$u8loc' as a multibyte locale" if [ -z "$FNMESS_TEST_SINGLOC" ]; then echo 'Looking for an ISO-8859-1 or ISO-8859-15 locale' singloc="$(locale -a | grep -Eie '\.iso-?8859-?(1|15)([^a-zA-Z0-9_-]|$)' | head -n1)" if [ -z "$singloc" ]; then echo "No ISO-8859-1 or ISO-8859-15 locale found" 1>&2 exit 1 fi else singloc="$FNMESS_TEST_SINGLOC" fi echo "Using '$singloc' as a single-byte locale" tempf="$(mktemp)" trap "rm -f -- '$tempf'" EXIT INT HUP QUIT TERM echo "Using '$tempf' as a temporary file" printf -- '\n==== Running in the %s locale, expected: no, yes, no\n\n' "$singloc" env LC_CTYPE="$singloc" "$@" check "$tempf" "$singloc" 'no' 'yes' 'no' "$@" printf -- '\n==== Running in the %s locale, expected: yes, no, no\n' "$u8loc" env LC_CTYPE="$u8loc" "$@" check "$tempf" "$u8loc" 'yes' 'no' 'no' "$@" printf -- '\n==== Seems fine!\n\n'
#!/usr/bin/python3 # # Copyright (c) 2022 Peter Pentchev <r...@ringlet.net> # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. """Check whether Python's fnmatch is bug-for-bug compatible with libc.""" import fnmatch import locale def check(value: str, pattern: str) -> None: """Check whether the value matches the pattern.""" res = "yes" if fnmatch.fnmatch(value, pattern) else "no" print(f"Does it match '{pattern}': {res}") def main() -> None: """Does the Python fnmatch() function also have that bug?""" encoding = locale.nl_langinfo(locale.CODESET) print(f"Using {encoding} as the LC_CTYPE character encoding") bstr = b"\xC3\xB1" cstr = bstr.decode(encoding) print(f"The character string now has a length of {len(cstr)}") check(cstr, "?") check(cstr, "??") check(cstr, "???") if __name__ == "__main__": main()