Hi Arne,
Thanks, these are the more useful objections that do allow us to figure
out how to move forwards (either with LLMs without).
On 4/7/26 01:29, Dr. Arne Babenhauserheide wrote:
Hugo Buddelmeijer via <[email protected]> writes:
Let's limit the scope a bit, to my favorite pastime: refreshing Guix
packages. Guix has over 30000 packages. Involving agents in
maintaining them would make that massively more manageable.
…
There was indeed not an explicit proposal here. So for clarity, and
specificity: I propose to use LLMs and Agents in updating Guix
packages.
How do they improve over the existing update-checks?
Are they more reliable than
./pre-inst-env guix refresh -u PACKAGE
LLMs are much more reliable than guix refresh, but you do need a human
in the loop. On guix-devel I shared an experiment where I asked Codex
four times to pick a broken package at random and it fixed all four.
Three by writing a substitute* to add missing headers and one by
refreshing and adding an extra dependency. (I did not submit P.R.s for
them, see below for why.) See
https://lists.gnu.org/archive/html/guix-devel/2026-02/msg00231.html
But, this is only on the code side, what a human would do is:
- Perhaps pick a better solution (e.g. compiler flag).
- Create an upstream issue and perhaps a P.R.
- Perhaps mark the package for deprecation instead.
- Check whether this is a general problem and act accordingly.
I suppose LLMs can do those too, but I would personally not want 'my'
LLM to autonomously interact with (unsuspecting) people on my behalve.
And if they are in some situations: why not improve guix refresh
instead, which improves the situation for everyone, including people
without subscription to a proprietary service or access to massive
computing power?
Your saying two important things in one sentence.
Firstly, I don't think it is feasible to make guix refresh as good as
LLMs. In the experiment, the LLM inspected the build log to figure out
what was wrong and used that to infer what needed to change. I can't
fathom writing that into guix refresh.
Secondly, it would be a terrible long-term outcome if Guix can only be
maintained through LLMs. That would IMO indeed defy the point of the
entire project, as we still write for humans. E.g., reproducible
software is only good if humans can understand the reproducibility.
But the opposite is true too, LLMs can also make it possible for more
people to contribute to Guix.
My conclusion so far is that if we allow the use of LLMs, we should
ensure that we do it in a way that enhances the human connections and
the human experience. Would that approach make sense to you, 'allowing'
LLM use if it fosters the human experience?
That is one reason why I did not submit P.R.s for those four packages,
it would not help any human in any way.
Yes, that scraping is pretty bad and we should not contribute that. I
don't actually know whether the LLMs I use do this agressieve
crawling; I should check, thanks.
It’s almost guaranteed that they do.
I see every major LLM company hit my server with tons of requests.
Yesterday 47% requests in the logs of my personal website were detected
by goaccess as crawlers (this is likely underestimated, because my logs
zero out parts of the IPs to respect the privacy of my visitors).
Summing up march 2026, detected crawlers downloaded 277.75 GiB of data
in 1,161,790 requests.
Differenting them shows bots by every LLM company I know (and then
some).
Thank you for sharing this data.
I've mainly got experience with Codex, so I tried to look up what they
do. They claim to respect robots.txt. They don't say how often they
crawl sites, but make it sound like it is a handful of times a day.
See https://developers.openai.com/api/docs/bots .
If OpenAI's claims are correct, then it sounds reasonable to me.
Especially since OpenAI has such a large number of users.
Does that correspond with your numbers or do you think they are lying?
As in, there probably are unscrupulous crawlers out there. But refusing
to use a decent crawler does nothing to change that.
For context: this is a personal website, not some video sharing service.
99% of its content does not change from month to month. So I consider
LLM companies to be staging a DoS on every public website.
What slop? Again, trying to assume good faith: code is something LLMs
do very well, often better than people.
My experience is very different: LLMs quickly output a lot of code, but
they make review harder. Our main bottleneck is review (which is the
main bottleneck for most Free Software Projects), so LLMs make our
bottleneck worse.
Yes that's the other reason I did not submit P.R.s for those four
packages. Since apparently no-one cares for these packages, so I'm not
going to waste reviewers time.
I do fix random packages by hand, and submit those. Because those are
either trivial (and cost less time than a deprecation request would
cost), or give me an opportunity to learn (so I'm willing to ask for
time, that I try to pay forward when I can). Using an LLM could help
with the learning process, but I would not request time for reviewing that.
But LLMs are also faster for packages that people do care about. And
the result is often better, certainly in collaboration with a human. My
first P.R.s were full of problems that would not exist if I had employed
an LLM.
I'm mostly involved in the Python ecosystem. We are almost at the point
of having Pyhton 3.12 merged (hahaha, I wish), while I would want to be
working on Python 3.14 now. LLMs would speed up this process, and I'm
looking for how to do so without compromising our values. Enhancing our
values even.
Yes, slow and steady wins the race, but we're now doing the opposite.
We should not fall into the trap that code is magically better if it was
painfully written by hand.
Consider this scenario where two people collaborate, voluntary. They
use LLMs (and their own human intellect) to fix packages and review each
others code. Would that be okay with you?
I'm totally fine with disclosing any LLM use, since I still 'own' the
code. I'm also fine with people refusing to review my code for any
(non-discriminatory) reasons.
But right now I almost feel like people will be angry with me if I
search an error message because I might glance at the LLM summary and
learn the cause of the problem in an indecent manner that would cause my
P.R. to be rejected.
Yes, it would be better if we have an army of people who each know the
manual and Changelog of dozens of libraries by heart and can just fix a
package from memory. I might even be one, for my small subset of
expertise. But we don't have that luxury for all 30000 packages.
Or positively: we don't have that luxury yet. I do try to collaborate
with upstream, and more often than not people are happily surprised that
we package their software in Guix. It is paramount to keep enhancing
that human connection.
Before you go there (since this is the reply I got too
often): LLMs commenting on code is no review. It just increases the
amount of generated content people have to read, so it makes review
require more human effort.
I would not go there. The goal should always be to write code for
humans in a project like Guix.
Hmm, maybe that is a good starting point for a hypothetical GCD.
Because it does not apply to every project; I've 'written' code I did
not understand, for one-off tasks where only the result matters. Guix
is in a totally opposite corner, where the entire point (for me a least)
is that humans can understand it.
Thanks,
Hugo