Package: libc6
Version: 2.36-5
Severity: normal
Tags: upstream
X-Debbugs-Cc: r...@debian.org

Hi,

Thanks for taking care of glibc in Debian!

While trying to write a test case for a text processing utility that is
sort of aware of locales and character encodings, I stumbled upon
the fact that, in an UTF-8-capable locale, fnmatch() seems to think
that the `ñ` ("enye", "LATIN SMALL LETTER N WITH TILDE", U+00F1)
character should match both the "?" and "??" patterns. See the attached
C program and the `run-test.sh` demonstration tool; `make test` in
a directory where all four files are installed should do it.
If anything goes wrong with the attached files, they are also available
in a GitLab repository at https://gitlab.com/ppentchev/fnmess

A bullseye chroot and Docker container do not show the problem
(the test passes).

FTR, I was able to reproduce the problem on an AlmaLinux 9 system with
glibc 2.34, so it might not be limited to 2.36.

Thanks in advance for your time, and keep up the great work!

G'luck,
Peter


-- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable-updates'), (500, 
'stable-security'), (500, 'oldstable-updates'), (500, 'oldoldstable'), (500, 
'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-4-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=bg_BG.UTF-8, LC_CTYPE=bg_BG.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libc6 depends on:
ii  libgcc-s1  12.2.0-9

Versions of packages libc6 recommends:
ii  libidn2-0  2.3.3-1+b1

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0]  1.5.79
pn  glibc-doc              <none>
ii  libc-l10n              2.36-5
pn  libnss-nis             <none>
pn  libnss-nisplus         <none>
ii  locales                2.36-5

-- debconf information:
* libraries/restart-without-asking: true
  glibc/disable-screensaver:
  glibc/kernel-not-supported:
  glibc/kernel-too-old:
  glibc/restart-failed:
  glibc/restart-services:
  glibc/upgrade: true
#!/usr/bin/make -f
#
# Copyright (c) 2022  Peter Pentchev <r...@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.

CPPFLAGS?=      -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700

CFLAGS_WARN?=   -Wall -W -Wextra -Wno-trigraphs
CFLAGS_OPT?=    -g -O -pipe

CFLAGS?=        ${CFLAGS_WARN} ${CFLAGS_OPT}

LDFLAGS?=

LIBS?=

all:            fnmess

fnmess:         fnmess.o
                cc ${LDFLAGS} -o fnmess fnmess.o ${LIBS}

fnmess.o:       fnmess.c
                cc -c ${CPPFLAGS} ${CFLAGS} -o fnmess.o fnmess.c

clean:
                rm -f fnmess fnmess.o

test:           all
                sh run-test.sh python3 fnmess.py
                sh run-test.sh ./fnmess

.PHONY:         clean all test
/**
 * Copyright (c) 2022  Peter Pentchev <r...@ringlet.net>
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#include <fnmatch.h>
#include <locale.h>
#include <stdio.h>

int main(void)
{
        char enye[3] = {0xC3, 0xB1, 0};
        puts("Hell world!");

        setlocale(LC_ALL, "");
        printf("Using the '%s' locale for LC_CTYPE\n", setlocale(LC_CTYPE, 
NULL));

        printf("Does it match '?': %s\n", fnmatch("?", enye, 0) == 0 ? "yes" : 
"no");
        printf("Does it match '??': %s\n", fnmatch("??", enye, 0) == 0 ? "yes" 
: "no");
        printf("Does it match '???': %s\n", fnmatch("???", enye, 0) == 0 ? 
"yes" : "no");
        return 0;
}
#!/bin/sh
#
# Copyright (c) 2022  Peter Pentchev <r...@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.

set -e

check()
{
        local tempf="$1" loc="$2" c1="$3" c2="$4" c3="$5"
        shift 5

        printf -- '\n==== Checking the result for the %s locale\n\n' "$loc"
        env LC_CTYPE="$loc" "$@" > "$tempf"

        # Yes, there are dozens of ways to make this more generic. I know.
        if ! grep -Fxe "Does it match '?': $c1" -- "$tempf"; then
                echo 'Failed the "?" check' 1>&2
                exit 1
        fi

        if ! grep -Fxe "Does it match '??': $c2" -- "$tempf"; then
                echo 'Failed the "??" check' 1>&2
                exit 1
        fi

        if ! grep -Fxe "Does it match '???': $c3" -- "$tempf"; then
                echo 'Failed the "???" check' 1>&2
                exit 1
        fi
}

if [ "$#" -eq 0 ]; then
        echo 'Usage: run-test.sh command [args...]' 1>&2
        echo '' 1>&2
        echo 'Examples: run-test.sh ./fnmess' 1>&2
        echo '          run-test.sh python3 fnmess.py' 1>&2
        echo '' 1>&2
        exit 1
fi

if [ -z "$FNMESS_TEST_U8LOC" ]; then
        echo 'Looking for an UTF-8-capable locale'
        u8loc="$(locale -a | grep -Eie '\.utf-?8([^a-zA-Z0-9_-]|$)' | head -n1)"
        if [ -z "$u8loc" ]; then
                echo "No UTF-8-capable locale found" 1>&2
                exit 1
        fi
else
        u8loc="$FNMESS_TEST_U8LOC"
fi
echo "Using '$u8loc' as a multibyte locale"

if [ -z "$FNMESS_TEST_SINGLOC" ]; then
        echo 'Looking for an ISO-8859-1 or ISO-8859-15 locale'
        singloc="$(locale -a | grep -Eie 
'\.iso-?8859-?(1|15)([^a-zA-Z0-9_-]|$)' | head -n1)"
        if [ -z "$singloc" ]; then
                echo "No ISO-8859-1 or ISO-8859-15 locale found" 1>&2
                exit 1
        fi
else
        singloc="$FNMESS_TEST_SINGLOC"
fi
echo "Using '$singloc' as a single-byte locale"

tempf="$(mktemp)"
trap "rm -f -- '$tempf'" EXIT INT HUP QUIT TERM
echo "Using '$tempf' as a temporary file"

printf -- '\n==== Running in the %s locale, expected: no, yes, no\n\n' 
"$singloc"
env LC_CTYPE="$singloc" "$@"
check "$tempf" "$singloc" 'no' 'yes' 'no' "$@"

printf -- '\n==== Running in the %s locale, expected: yes, no, no\n' "$u8loc"
env LC_CTYPE="$u8loc" "$@"
check "$tempf" "$u8loc" 'yes' 'no' 'no' "$@"

printf -- '\n==== Seems fine!\n\n'
#!/usr/bin/python3
#
# Copyright (c) 2022  Peter Pentchev <r...@ringlet.net>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
"""Check whether Python's fnmatch is bug-for-bug compatible with libc."""

import fnmatch
import locale


def check(value: str, pattern: str) -> None:
    """Check whether the value matches the pattern."""
    res = "yes" if fnmatch.fnmatch(value, pattern) else "no"
    print(f"Does it match '{pattern}': {res}")


def main() -> None:
    """Does the Python fnmatch() function also have that bug?"""
    encoding = locale.nl_langinfo(locale.CODESET)
    print(f"Using {encoding} as the LC_CTYPE character encoding")

    bstr = b"\xC3\xB1"
    cstr = bstr.decode(encoding)
    print(f"The character string now has a length of {len(cstr)}")

    check(cstr, "?")
    check(cstr, "??")
    check(cstr, "???")


if __name__ == "__main__":
    main()

Reply via email to