Re: request for help: LLM-based quality assurance

Jeffrey Walton Mon, 08 Jun 2026 08:41:07 -0700

On Tue, Jun 2, 2026 at 1:56 PM Bruno Haible via Gnulib discussion list
<[email protected]> wrote:
>
> Anyone out here who is familiar with LLMs (or wants to get familiar with
> LLMs): How about using it not for coding, but for checking commits that
> went into gnulib master?
>
> Since 2026-01-01, at least 17 gnulib commits contained regressions, that
> had to be fixed subsequently. We often detect regressions by code review
> or by a CI run. The problems:
>   - Not all commits gets reviewed from a different developer than the
>     committer. (Like many free software projects, Gnulib lacks good 
> reviewers.)
>   - The CI runs possibly a week later. (We can't increase the frequency,
>     because some CI runs fail due to network problems or other noise,
>     and this noise needs to be filtered out.)
>
> As a complement to these QA techniques, Paul Eggert suggests to use an
> LLM to analyze the commits that have been pushed into gnulib master.
>
> This should be promising, because I read recently that LLMs outperform
> all classical static analysis tools, when it comes to analyzing source code.
>
> I can't do this myself, because I'm already quite loaded with the existing
> QA techniques and with my work on other GNU packages.
>
> Therefore, if you volunteer, please step up!

I think it is a good idea to utilize LLMs to detect problems,
especially latent problems that have slipped through the cracks over
the years. However, there are downsides to LLMs, and I would look
into fixing the current processes while using the LLMs as a complement
to existing practices.

First, code should not be merged into Master until the CI tests have
successfully run. Commits that don't pass the CI test don't pass
through the security gate. The new code stays in a development branch
or testing fork until they pass the CI tests. This is how many (all?)
organizations with a mature SDLC operate.

Second, LLMs have at least three costs. First is the deskilling that
happens when relying on them. [1,2]. Relying too much on LLMs will
have a negative effect on the talent contributing to the project.
Second is the power consumed powering the algorithms, and its effect
on renewable energy.[3,4] The environmental and societal cost seems
to be high. Third is the credit system being used to buy LLM time and
the associated costs.[5] There is no free ride. And if the
algorithms are going to be invoked at each commit, that could be a
disaster at scale. Are there enough tokens available to free software
to make LLM scanning a feasible procedural requirement?

Because of the downsides of using LLMs, it might be wise to have a
small team use LLMs on important projects (like GNUlib) monthly or
quarterly. That could help achieve a balance with the cost of using
LLMs.

And for the record, the work of folks like Pádraig and Pavel look like
very nice work. It is a pleasure to watch folks operate the prompts
well.

[1] The AI Deskilling Paradox,
<https://cacm.acm.org/news/the-ai-deskilling-paradox/>.
[2] The Great AI Deskilling has begun,
<https://www.businessinsider.com/ai-deskilling-impact-on-worker-skills-productivity-2026-3>.
[3] AI's Power Requirements Under Exponential Growth,
<https://www.rand.org/pubs/research_reports/RRA3572-1.html>
[4] AI’s Power Needs Will Destroy the Renewable Energy Revolution,
<https://www.scientificamerican.com/article/ais-power-needs-will-destroy-the-renewable-energy-revolution/>
[5] Understanding LLM Cost Per Token: A 2026 Practical Guide,
<https://www.silicondata.com/blog/llm-cost-per-token>.

Jeff

Re: request for help: LLM-based quality assurance

Reply via email to