Re: request for help: LLM-based quality assurance

Bruno Haible via Gnulib discussion list Mon, 08 Jun 2026 07:36:55 -0700

Hi Pavel,

Thanks for the details.


> Another run with a similar prompt yielded this:
> 
> --->8---
> Review of commit 17dc60e624cd6fc3491f9cb002f760d60e66ce8b
> "mbrtoc32, mbrtowc: do not optimze for uClibc-ng"
> 
> VERDICT: REGRESSION INTRODUCED
> 
> The intended change — adding `&& !__UCLIBC__` to glibc-specific optimizations
> in mbrtoc32.c and mbrtowc.c — is correct in principle. However, the `n == 0`
> early-return guard in lib/mbrtowc.c (line 122) was replaced wholesale with the
> condition from mbrtoc32.c instead of being surgically modified, introducing 
> two
> bugs.
> 
> Bug 1: Wrong macro name (HIGH severity, HIGH confidence)
> ---------------------------------------------------------
> The condition in mbrtowc.c changed from:
> 
>   # if MBRTOWC_EMPTY_INPUT_BUG || (GNULIB_WCHAR_SINGLE_LOCALE && __GLIBC__ >= 
> 2)
> 
> to:
> 
>   # if (MBRTOC32_EMPTY_INPUT_BUG || _GL_SMALL_WCHAR_T \
>         || (GNULIB_WCHAR_SINGLE_LOCALE && __GLIBC__ >= 2 && !__UCLIBC__))
> 
> MBRTOWC_EMPTY_INPUT_BUG was replaced with MBRTOC32_EMPTY_INPUT_BUG. These are
> distinct autoconf macros defined in separate m4 files (m4/mbrtowc.m4 vs
> m4/mbrtoc32.m4), testing different functions. On platforms where mbrtowc has
> the empty-input bug but mbrtoc32 does not (e.g., CentOS 7, MSVC), the n == 0
> guard is silently omitted, causing incorrect behavior in rpl_mbrtowc.
> 
> Bug 2: Spurious _GL_SMALL_WCHAR_T added (MEDIUM severity, HIGH confidence)
> ---------------------------------------------------------------------------
> _GL_SMALL_WCHAR_T was never used in mbrtowc.c before this commit. It is an
> mbrtoc32.c-specific concern (defined in uchar.in.h, relevant to char32_t
> conversions on platforms with 16-bit wchar_t). Adding it to mbrtowc.c causes
> the n == 0 early return to fire on Windows/Cygwin platforms where it 
> previously
> did not — an unintended behavioral change.
> 
> Root cause
> ----------
> Copy-paste of the entire preprocessor condition from mbrtoc32.c line 137 into
> mbrtowc.c line 122, instead of only appending `&& !__UCLIBC__` to the existing
> condition. The correct fix for mbrtowc.c should have been:
> 
>   # if (MBRTOWC_EMPTY_INPUT_BUG \
>         || (GNULIB_WCHAR_SINGLE_LOCALE && __GLIBC__ >= 2 && !__UCLIBC__))

Good to see that it found the regression this time as well.

> Note: This regression was subsequently fixed in commit 2ca51a77e6
> ("mbrtowc: Fix handling of n==0 on CentOS 7, MSVC (regr. 2026-05-25)").

Hmm. Did it find the regression, ot did it look in newer commits...?

> The lack of reproducibility (obtaining quite different outputs for
> prompts with no material difference, or no difference at all) is IMO a
> major obstacle for using LLMs.

Well, we are used to this lack of reproducibility from human work (e.g.
emails) anyway. So, IMO, for code reviews it does not matter much. The
main area where lack of reproducibility is a problem is for people who do
vibe coding: it forces the generated code to be considered as "source code".
(Whereas when you use a compiler to transform C code to x86_64 machine code,
it is reproducible and therefore the .o files don't need to be considered
as "source code".)

> My first attempt produced a different result:
> --->8---
> (...)
>   5. MUSL_LIBC detection fragility (medium concern): This is the most notable
>   issue. MUSL_LIBC is only defined via m4/musl.m4 through the mbszero module's
>   configure machinery — musl provides no compiler-predefined macro.

Indeed, that is a problem that I fixed in a subsequenct commit
dfc349cede4dcf4bb033420f61f70dd5e73b7631. But again, I don't know whether
it found this issue by looking at the newer commits...

> I have not provided any Gnulib specifics at all. It has figured out
> everything by itself.

That is impressing. I would guess that with an explanation of the
Gnulib conventions (the modules and the meaning of each part of the
module descriptions [available in the documentation]) we would get
even better results.

> I found that even this simple prompt:
> Review commit $commit in this repository.
> Focus on possible functionality regressions, not on minor/style issues.
> 
> produced good results.

That's remarkable!

Bruno

Re: request for help: LLM-based quality assurance

Reply via email to