Hugo Buddelmeijer <[email protected]> writes:

> Hi Arne,
>
> Thanks, these are the more useful objections that do allow us to
> figure out how to move forwards (either with LLMs without).
>
> On 4/7/26 01:29, Dr. Arne Babenhauserheide wrote:
>> Hugo Buddelmeijer via <[email protected]> writes:
>> 
>>> Let's limit the scope a bit, to my favorite pastime: refreshing Guix
>>> packages.  Guix has over 30000 packages.  Involving agents in
>>> maintaining them would make that massively more manageable.
>> …
>>> There was indeed not an explicit proposal here.  So for clarity, and
>>> specificity: I propose to use LLMs and Agents in updating Guix
>>> packages.
>> How do they improve over the existing update-checks?
>> Are they more reliable than
>>      ./pre-inst-env guix refresh -u PACKAGE
>
> LLMs are much more reliable than guix refresh, but you do need a human
> in the loop.

What do you mean by reliable here? Do you mean that they fix a problem
that may crop up, that they find the most recent version, or that they
fix a known broken package?

> four times to pick a broken package at random and it fixed all four.

^ this sounds like you may mean broken packages, not just outdated ones.

> I suppose LLMs can do those too, but I would personally not want 'my'
> LLM to autonomously interact with (unsuspecting) people on my behalve.

Thank you! Doing that would put the load of interpreting generated text
on other people (which is often mentally taxing).

>> And if they are in some situations: why not improve guix refresh
>> instead, which improves the situation for everyone, including people
>> without subscription to a proprietary service or access to massive
>> computing power?
>
> Your saying two important things in one sentence.
>
> Firstly, I don't think it is feasible to make guix refresh as good as
> LLMs.  In the experiment, the LLM inspected the build log to figure
> out what was wrong and used that to infer what needed to change.  I
> can't fathom writing that into guix refresh.

This is bugfixing, not refreshing, so it sounds less like maintaining
packages and more like fixing problems (that maybe should be avoided in
the first place).

I’m annoyed by broken packages, but I think that the approach for that
would rather be to have a more distributed test-suite that ensures that
after an update all possibly affected packages¹ still build and
successfully run their tests.

¹ rule of thumb: the ones that have the updated package somewhere in
their dependency graph.

> Secondly, it would be a terrible long-term outcome if Guix can only be
> maintained through LLMs.  That would IMO indeed defy the point of the
> entire project, as we still write for humans.  E.g., reproducible
> software is only good if humans can understand the reproducibility.

I fully agree.

> But the opposite is true too, LLMs can also make it possible for more
> people to contribute to Guix.

In a project I maintain, we now have two clear rules:


* ensure that you understand the change you suggest. You must not
  suggest code that you know you don’t understand.

* ensure that your pull request is easy to work with. File additional
  pull-requests for infrastructure if those are required to release your
  change.


The second was already in place for a long time. We added the first due
to problems with LLM vibed PRs and forks.

> My conclusion so far is that if we allow the use of LLMs, we should
> ensure that we do it in a way that enhances the human connections and
> the human experience.  Would that approach make sense to you,
> 'allowing' LLM use if it fosters the human experience?

I made the experience that people tend to write software that can
(best/only) be used with the tools they use themselves.

The two risks of accepting LLMs to the practical work¹ I see:


1. Adding structures that are hard to edit without LLMs

2. Changing code to be easier to understand for LLMs -- making it harder
   to understand for humans in the process.


I’ve seen this with IDEs where in some projects you’re effectively
forced to use a proprietary IDE because people let code grow in a way
that made it unusable with other tools -- while with other tools they
would have (re-)structured it differently so I’d still be able to work
with it efficiently.

¹ leaving out their other problems, because those are not the topic in
  this section.

>>> Yes, that scraping is pretty bad and we should not contribute that.  I
>>> don't actually know whether the LLMs I use do this agressieve
>>> crawling; I should check, thanks.
>> It’s almost guaranteed that they do.
>> I see every major LLM company hit my server with tons of requests.
>> Yesterday 47% requests in the logs of my personal website were
>> detected
>> by goaccess as crawlers (this is likely underestimated, because my logs
>> zero out parts of the IPs to respect the privacy of my visitors).
>> Summing up march 2026, detected crawlers downloaded 277.75 GiB of
>> data
>> in 1,161,790 requests.
>> Differenting them shows bots by every LLM company I know (and then
>> some).
>
> Thank you for sharing this data.
>
> I've mainly got experience with Codex, so I tried to look up what they
> do.  They claim to respect robots.txt.  They don't say how often they
> crawl sites, but make it sound like it is a handful of times a day.
>
> See https://developers.openai.com/api/docs/bots .
>
> If OpenAI's claims are correct, then it sounds reasonable to me.
> Especially since OpenAI has such a large number of users.
>
> Does that correspond with your numbers or do you think they are lying?

GPTBot alone did 109,552 accesses to my website in march, so I think
they are telling the truth in a very misleading way.

The websites that go into these stats have together about 2000 HTML
documents (https://www.1w6.org has 811, https://www.draketo.de/node has
827 and https://www.draketo.de/ has 296).

99% of these change less than once per year.

If GPTbot crawls them every day, that’s 2000x30 = 60.000 accesses per
month -- which is pretty close to the 109,552 accesses I see.

But I built these websites over 20 years. The oldest articles are from
2007.

A human goes there, reads 1-20 articles and leaves again. Maybe to
return later when there’s a new article (I have RSS feeds).

An LLM goes there and crawls everything. Every day.

There even was a week where GPT tried every possible combination of
search inputs on 1w6.org -- including repeated arguments, likely until
it hit the URL length limit of the server. My log analysis tool needed
days to complete the analysis after that week. And I give thanks to my
hoster that they didn’t boot me then (and that I don’t have to pay for
excess bandwidth).

So yes, I think they do what they say, and that does make them an
unscrupulous crawler. Like most of the other LLM crawlers (GPTBot is 10%
of the crawler traffic).

>>> What slop?  Again, trying to assume good faith: code is something LLMs
>>> do very well, often better than people.
>> My experience is very different: LLMs quickly output a lot of code,
>> but
>> they make review harder. Our main bottleneck is review (which is the
>> main bottleneck for most Free Software Projects), so LLMs make our
>> bottleneck worse.
>
> Yes that's the other reason I did not submit P.R.s for those four
> packages.  Since apparently no-one cares for these packages, so I'm
> not going to waste reviewers time.
>
> I do fix random packages by hand, and submit those.  Because those are
> either trivial (and cost less time than a deprecation request would
> cost), or give me an opportunity to learn (so I'm willing to ask for
> time, that I try to pay forward when I can).  Using an LLM could help
> with the learning process, but I would not request time for reviewing
> that.
>
> But LLMs are also faster for packages that people do care about.

sidenote: I think that for almost all packages there are people who care
about them. I’ve been blocked from working more than once because a
package I used got removed.

It’s just that people don’t talk about stuff that works, because they
are busy building on top of it. And if they have to look into each of
the thousand dependencies, they can’t build something with them.

⇒ context: https://www.draketo.de/software/volatile-infrastructure

Guix graph actually is a great tool to deal with that.

Have a look at
guix graph -t package -b cyclonedx-json mercurial
and
guix graph -t bag-emerged -b cyclonedx-json mercurial

We actually have a list of implicit "I care about" for each package.
What we don’t have AFAIK is a list of reference manifests to use for
regression testing.

> And
> the result is often better, certainly in collaboration with a human.
> My first P.R.s were full of problems that would not exist if I had
> employed an LLM.

I went through the same. But there’s a difference: with you the
reviewers time is well spent: they help a new contributor grow.

With an LLM their time goes into the void. The LLM doesn’t learn to work
with this project. And at the next update it may have completely
different idiosyncrasies that reviewers would have to learn.

> I'm mostly involved in the Python ecosystem.  We are almost at the
> point of having Pyhton 3.12 merged (hahaha, I wish), while I would
> want to be working on Python 3.14 now.

I feel your pain. The volatility Python developed after version 3 is a
reason why I no longer use it for my hobby projects. And the packages I
still maintain are a drain on my time, because they break about once a
year without me doing any changes.

> Consider this scenario where two people collaborate, voluntary.  They
> use LLMs (and their own human intellect) to fix packages and review
> each others code.  Would that be okay with you?

If they use the LLMs for review, then not. Because then there would
still be code in Guix that wasn’t reviewed by a human.

If there was a comma missing before "and review" (⇒ humans review the
code) and they make sure that the code does not get harder for people
without LLMs to use (unlikely in the long run -- I’ve seen the effects
of local convenience multiple times now), then I wouldn’t mind.

> But right now I almost feel like people will be angry with me if I
> search an error message because I might glance at the LLM summary and
> learn the cause of the problem in an indecent manner that would cause
> my P.R. to be rejected.

I search for stackoverflow threads to find a solution, and I don’t see
an LLM parsing of an error message much differently.

As long as you still analyze it then. And resist the urge (that LLMs
foster) to just click "fix it".

> Or positively: we don't have that luxury yet.  I do try to collaborate
> with upstream, and more often than not people are happily surprised
> that we package their software in Guix.  It is paramount to keep
> enhancing that human connection.

That definitely.

> Hmm, maybe that is a good starting point for a hypothetical GCD.
> Because it does not apply to every project; I've 'written' code I did
> not understand, for one-off tasks where only the result matters.  Guix
> is in a totally opposite corner, where the entire point (for me a
> least) is that humans can understand it.

I think that that applies to all code that is not guaranteed to get
thrown away after use. Because prototypes tend to stick around and
become the core of other software, and if no one understood that
prototype to begin with, that’s a trainwreck in the making.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
https://www.draketo.de

Attachment: signature.asc
Description: PGP signature

Reply via email to