Hugo Buddelmeijer <[email protected]> writes: > Hi Arne, > > Thanks, these are the more useful objections that do allow us to > figure out how to move forwards (either with LLMs without). > > On 4/7/26 01:29, Dr. Arne Babenhauserheide wrote: >> Hugo Buddelmeijer via <[email protected]> writes: >> >>> Let's limit the scope a bit, to my favorite pastime: refreshing Guix >>> packages. Guix has over 30000 packages. Involving agents in >>> maintaining them would make that massively more manageable. >> … >>> There was indeed not an explicit proposal here. So for clarity, and >>> specificity: I propose to use LLMs and Agents in updating Guix >>> packages. >> How do they improve over the existing update-checks? >> Are they more reliable than >> ./pre-inst-env guix refresh -u PACKAGE > > LLMs are much more reliable than guix refresh, but you do need a human > in the loop.
What do you mean by reliable here? Do you mean that they fix a problem that may crop up, that they find the most recent version, or that they fix a known broken package? > four times to pick a broken package at random and it fixed all four. ^ this sounds like you may mean broken packages, not just outdated ones. > I suppose LLMs can do those too, but I would personally not want 'my' > LLM to autonomously interact with (unsuspecting) people on my behalve. Thank you! Doing that would put the load of interpreting generated text on other people (which is often mentally taxing). >> And if they are in some situations: why not improve guix refresh >> instead, which improves the situation for everyone, including people >> without subscription to a proprietary service or access to massive >> computing power? > > Your saying two important things in one sentence. > > Firstly, I don't think it is feasible to make guix refresh as good as > LLMs. In the experiment, the LLM inspected the build log to figure > out what was wrong and used that to infer what needed to change. I > can't fathom writing that into guix refresh. This is bugfixing, not refreshing, so it sounds less like maintaining packages and more like fixing problems (that maybe should be avoided in the first place). I’m annoyed by broken packages, but I think that the approach for that would rather be to have a more distributed test-suite that ensures that after an update all possibly affected packages¹ still build and successfully run their tests. ¹ rule of thumb: the ones that have the updated package somewhere in their dependency graph. > Secondly, it would be a terrible long-term outcome if Guix can only be > maintained through LLMs. That would IMO indeed defy the point of the > entire project, as we still write for humans. E.g., reproducible > software is only good if humans can understand the reproducibility. I fully agree. > But the opposite is true too, LLMs can also make it possible for more > people to contribute to Guix. In a project I maintain, we now have two clear rules: * ensure that you understand the change you suggest. You must not suggest code that you know you don’t understand. * ensure that your pull request is easy to work with. File additional pull-requests for infrastructure if those are required to release your change. The second was already in place for a long time. We added the first due to problems with LLM vibed PRs and forks. > My conclusion so far is that if we allow the use of LLMs, we should > ensure that we do it in a way that enhances the human connections and > the human experience. Would that approach make sense to you, > 'allowing' LLM use if it fosters the human experience? I made the experience that people tend to write software that can (best/only) be used with the tools they use themselves. The two risks of accepting LLMs to the practical work¹ I see: 1. Adding structures that are hard to edit without LLMs 2. Changing code to be easier to understand for LLMs -- making it harder to understand for humans in the process. I’ve seen this with IDEs where in some projects you’re effectively forced to use a proprietary IDE because people let code grow in a way that made it unusable with other tools -- while with other tools they would have (re-)structured it differently so I’d still be able to work with it efficiently. ¹ leaving out their other problems, because those are not the topic in this section. >>> Yes, that scraping is pretty bad and we should not contribute that. I >>> don't actually know whether the LLMs I use do this agressieve >>> crawling; I should check, thanks. >> It’s almost guaranteed that they do. >> I see every major LLM company hit my server with tons of requests. >> Yesterday 47% requests in the logs of my personal website were >> detected >> by goaccess as crawlers (this is likely underestimated, because my logs >> zero out parts of the IPs to respect the privacy of my visitors). >> Summing up march 2026, detected crawlers downloaded 277.75 GiB of >> data >> in 1,161,790 requests. >> Differenting them shows bots by every LLM company I know (and then >> some). > > Thank you for sharing this data. > > I've mainly got experience with Codex, so I tried to look up what they > do. They claim to respect robots.txt. They don't say how often they > crawl sites, but make it sound like it is a handful of times a day. > > See https://developers.openai.com/api/docs/bots . > > If OpenAI's claims are correct, then it sounds reasonable to me. > Especially since OpenAI has such a large number of users. > > Does that correspond with your numbers or do you think they are lying? GPTBot alone did 109,552 accesses to my website in march, so I think they are telling the truth in a very misleading way. The websites that go into these stats have together about 2000 HTML documents (https://www.1w6.org has 811, https://www.draketo.de/node has 827 and https://www.draketo.de/ has 296). 99% of these change less than once per year. If GPTbot crawls them every day, that’s 2000x30 = 60.000 accesses per month -- which is pretty close to the 109,552 accesses I see. But I built these websites over 20 years. The oldest articles are from 2007. A human goes there, reads 1-20 articles and leaves again. Maybe to return later when there’s a new article (I have RSS feeds). An LLM goes there and crawls everything. Every day. There even was a week where GPT tried every possible combination of search inputs on 1w6.org -- including repeated arguments, likely until it hit the URL length limit of the server. My log analysis tool needed days to complete the analysis after that week. And I give thanks to my hoster that they didn’t boot me then (and that I don’t have to pay for excess bandwidth). So yes, I think they do what they say, and that does make them an unscrupulous crawler. Like most of the other LLM crawlers (GPTBot is 10% of the crawler traffic). >>> What slop? Again, trying to assume good faith: code is something LLMs >>> do very well, often better than people. >> My experience is very different: LLMs quickly output a lot of code, >> but >> they make review harder. Our main bottleneck is review (which is the >> main bottleneck for most Free Software Projects), so LLMs make our >> bottleneck worse. > > Yes that's the other reason I did not submit P.R.s for those four > packages. Since apparently no-one cares for these packages, so I'm > not going to waste reviewers time. > > I do fix random packages by hand, and submit those. Because those are > either trivial (and cost less time than a deprecation request would > cost), or give me an opportunity to learn (so I'm willing to ask for > time, that I try to pay forward when I can). Using an LLM could help > with the learning process, but I would not request time for reviewing > that. > > But LLMs are also faster for packages that people do care about. sidenote: I think that for almost all packages there are people who care about them. I’ve been blocked from working more than once because a package I used got removed. It’s just that people don’t talk about stuff that works, because they are busy building on top of it. And if they have to look into each of the thousand dependencies, they can’t build something with them. ⇒ context: https://www.draketo.de/software/volatile-infrastructure Guix graph actually is a great tool to deal with that. Have a look at guix graph -t package -b cyclonedx-json mercurial and guix graph -t bag-emerged -b cyclonedx-json mercurial We actually have a list of implicit "I care about" for each package. What we don’t have AFAIK is a list of reference manifests to use for regression testing. > And > the result is often better, certainly in collaboration with a human. > My first P.R.s were full of problems that would not exist if I had > employed an LLM. I went through the same. But there’s a difference: with you the reviewers time is well spent: they help a new contributor grow. With an LLM their time goes into the void. The LLM doesn’t learn to work with this project. And at the next update it may have completely different idiosyncrasies that reviewers would have to learn. > I'm mostly involved in the Python ecosystem. We are almost at the > point of having Pyhton 3.12 merged (hahaha, I wish), while I would > want to be working on Python 3.14 now. I feel your pain. The volatility Python developed after version 3 is a reason why I no longer use it for my hobby projects. And the packages I still maintain are a drain on my time, because they break about once a year without me doing any changes. > Consider this scenario where two people collaborate, voluntary. They > use LLMs (and their own human intellect) to fix packages and review > each others code. Would that be okay with you? If they use the LLMs for review, then not. Because then there would still be code in Guix that wasn’t reviewed by a human. If there was a comma missing before "and review" (⇒ humans review the code) and they make sure that the code does not get harder for people without LLMs to use (unlikely in the long run -- I’ve seen the effects of local convenience multiple times now), then I wouldn’t mind. > But right now I almost feel like people will be angry with me if I > search an error message because I might glance at the LLM summary and > learn the cause of the problem in an indecent manner that would cause > my P.R. to be rejected. I search for stackoverflow threads to find a solution, and I don’t see an LLM parsing of an error message much differently. As long as you still analyze it then. And resist the urge (that LLMs foster) to just click "fix it". > Or positively: we don't have that luxury yet. I do try to collaborate > with upstream, and more often than not people are happily surprised > that we package their software in Guix. It is paramount to keep > enhancing that human connection. That definitely. > Hmm, maybe that is a good starting point for a hypothetical GCD. > Because it does not apply to every project; I've 'written' code I did > not understand, for one-off tasks where only the result matters. Guix > is in a totally opposite corner, where the entire point (for me a > least) is that humans can understand it. I think that that applies to all code that is not guaranteed to get thrown away after use. Because prototypes tend to stick around and become the core of other software, and if no one understood that prototype to begin with, that’s a trainwreck in the making. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. https://www.draketo.de
signature.asc
Description: PGP signature
