On 4/8/26 12:20, Ludovic Courtès wrote:
Hi,
Hugo Buddelmeijer via <[email protected]> skribis:
On 4/7/26 01:29, Dr. Arne Babenhauserheide wrote:
[...]
Are they more reliable than
./pre-inst-env guix refresh -u PACKAGE
LLMs are much more reliable than guix refresh,
‘guix refresh’ checks upstream for the newer versions and updates
versions and hashes; LLMs can do many things (including proposing
changes based on what they learned from the Guix repo) but that’s
something they cannot do, by definition.
Then again, an agent, rather than an LLM, might be able to emulate what
‘guix refresh’ is doing through trial and error, but since we already
have the tool to do it right, I don’t understand what this would buy us.
I was not clear.
Firstly, I used the term LLM and agent interchangeably, even though I
prefer agent, because the fact that agents use LLMs is not the point.
(As in, the term 'agent' would still apply if the LLM part would be
switched for something else.)
Secondly, the idea was to use agents in addition to guix refresh, not as
a substitute. E.g., in my experiment, I told my agent to try to refresh
broken packages before trying to patch them. (It was quite interesting
to see it figure out how to use the various guix commands; it even RTFM'ed.)
The time consuming part of refreshing packages are the ones that break
(or their dependents that break). Agents work extremely well for broken
packages. guix refresh can already do simple things like adding new
inputs, but it can't write substitutions and such, and I can't see how
it ever can.
It often takes only a minute for an agent to fix a broken package, while
it easily takes 20 minutes for a human, sometimes much more.
E.g. with 6000 packages depending on Python, there will be 12000
potential breakages when going from Python 3.12 to 3.13 to 3.14; and the
experience of 3.11 to 3.12 shows that it can indeed be hundreds of
packages that break.
That can lead to hundreds of hours of work. I'm willing to do that
work. But I find it very hard to justify spending hours doing something
by hand that can trivially be automated by a machine. And there will be
more than enough manual work left.
My Guix-wishlist is huge, yet I spent my time writing substitute* calls
and adding CFLAGS. Clanker work.
I engaged here to find out a justification for doing that manual labor
and have not found it yet. (Other than the threat of not having code
merged otherwise, a very bad motivator for me.)
There are some problems with using agents yes, and I'm more than willing
to have a conversation about those problems, because agentic coding is a
big subject that requires more than one person to think through. At the
moment, I think there is plenty of opportunity to use agentic coding
while preserving, or even enhancing, our ideals.
To clarify: in my mind it is essential to keep the human in the loop,
especially for a project like Guix. Humans should maintain 'ownership'
of the code they contribute, humans should review and understand all of
their code, humans are responsible that their code is correct, humans
should do the communication with other humans, etc. For me this is
implied, but I learned it is good to write this out.
Hugo