Hi, On Sun, 21 Jun 2026 19:23:39 +0200 Ludovic Courtès <[email protected]> wrote:
The structure of the GCD looks much better than last time, more specifically I really like the "many in our community believe [genAI] has an impact that undermines the social foundations of free software and Guix" and "This document proposes the adoption of a pledge to safeguard our production from a legal standpoint as well as the social fabric built over almost 15 years around the project." as this really shows the link between the broader issues and Guix. Else with only "X is immoral therefor we should do Y", you could probably justify anything you want reguardless of X and Y with the proper propaganda, by shouting the loudest, etc, and that's what power structures do all the time. Though the exception about 15 lines that is taken out of context is a complete no-go for me. How is that possible when you participated to the "LLMs and clarifications on < 15 lines and copyright" thread? Did you forget because there is way too much things going on and that the discussions are heated and so on and that we needed a decision yesterday and that patches generated by LLMs are still going in Guix right now? In that case maybe we could make the GCD way smaller for now and just put the smallest/easiest/most-miniimal stop-gap (no code/data generated by LLMs gets in Guix until we get more legal clarity from lawyers, obviously without the exceptions for code under 15 lines). And then, once this is done, we could try to converge to some agreement on the rest of this GCD. Concervancy has some guidelines for dealing with LLMs, and reusing them somehow could make reaching consensus easier. Reference: https://h.net/Articles/1078521/#Comments We could also start tagging upstream source that has code generated by LLMs somehow, and make it possible to build it without (--without-llm-generated-content as a package transformation) to get more feedback on the consequences of not allowing LLM generated content in substitutes as well. Note that in practice allowing code generated by LLMs < 15 lines would also require Guix to go against GNU and LLMs are already controversial and we need a stop gap right now. So I assume that having a crisis right now, in the middle of decisions that are already controversial, and urgent, and in a period that is close to the holidays for most people, or under heat waves (in my case), is probably not the best idea. Note that GNU Boot also depends on Guix here. So it would be really messy. I've also other comments below (rewrite of paragraphs, better rationale, etc), but they might only be relevant later on if you want to send a v3 or address the broader context once a minimalist stopgap passed. ---------------------------------------------------------------------------- The indentation of this paragraph looks strange in the .md file: > However impressive the results may look, many in our community > believe genAI has an impact that undermines the social > foundations of free software and Guix: As things evolve, I would try to insist that this is current as things can evolve (I'm thinking about LLMs not made by big companies here, so the pace of evolution and the direction it takes will probably not be the same, but I would rather err on the side of caution, especially given how much resources we need to modify a GCD). So "currently" could be added: > However impressive the results may look, many in our community > believe genAI has an impact that currently undermines the social > foundations of free software and Guix: For instance, if I understood right, In the 60's there were protest against computers, because of the power imbalance they had (they costed a lot and their owners could run a database on it (nowadays CRMs and similar technologies are frightening for instance). This created a huge power imbalance at the time, and in some cases it still is (surveillance capitalism), but then computers were also used historically to fight opression, like with Operation Vula, or by freeing activists from activities like stuffing envelopes just to communicate. About: > - GenAI launders the reciprocity baked into copyleft licenses such > as the GNU General Public License (GPL), effectively violating it. A > real-world example of copyleft-laundering is [the `chardet` > LLM-assisted “rewrite” for the stated purpose of relicensing > from LGPL to MIT/Expat in March > > 2026](https://tuananh.net/2026/03/05/relicensing-with-ai-assisted-rewrite/) > ([covered by LWN](https://lwn.net/Articles/1061534/)) or the > [EmDash WordPress reimplentation in TypeScript “under the more > permissive MIT > license”](https://blog.cloudflare.com/emdash-wordpress/). Here I don't agree, as this has to be settled in court to really know it is the case, and you make the point later on with Chardet again. So this could be rewritten as "GenAI tries to launders [...] emdash-wordpress/), and at the time of writing this isn't settled by courts yet." but I think it's not good enough because there is also a power imbalance against copyleft here. And I think it is very important to note as well as this does affects our strategies a lot. Things can also evolve, in the past we had patent trolls, copyfraud, copyright trolls, etc... And not everything is well-known (did you know that OpenMoko was threatened by a company on mp3 patents for instance). And if I understood well, now more than ever, free software is a threat to some big corporations that are also involved in LLMs, and as I understand this played in Google trying to shut down F-Droid, and many other attempts like the age verification laws that were tailored specifically for companies making nonfree software and/or the surveillance capitalism business. References: https://f-droid.org/en/2026/02/24/open-letter-opposing-developer-verification.html https://agelesslinux.org/lobbyists.html > We believe it [stifles individual > autonomy](https://ali-alkhatib.com/blog/defining-ai) at a fundamental > level—replacing one’s ability to build up knowledge with a false > sense of quick achievement, building up [cognitive > debt](https://simonwillison.net/2026/Feb/15/cognitive-debt/)—while also > [weakening communities and destroying labor > power](https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/). I think the bigger picture also need to be taken into account here. Two things comes in mind: (1) LLMs are a tool, and (2) LLMs are at the moment inscrutable. (1) The consequence of (1) is that the tool embedded knowledge, and so if the tool is nonfree it practically deprive humans that use the tool of the knowledge that is embedded in the tool. (2) If we contrast (2) with programming languages or other free software automation tools, the vast majority of programming languages were made to be understood by humans, and some were made specifically to be understood by non-programmers (like FLOW-MATIC from Grace Hopper) which then influenced other languages like COBOL or Python ('or' instead of '||' is an example of that). People can also understand how graphical automation tool work if they're free software, documented, etc. Both (1) and (2) combined makes free software tools acceptable because they can empower people, and they also give people freedom to modify them, etc, and in case of LLMs these freedoms are currently denied to individuals, whereas individuals can still manage to get these freedoms even with very big software like Linux, Libreoffice, Firefox, etc, even with low end computers (it is possible to compile Linux on very low end computers, Firefox and Libreoffice is more challenging but it's probably possible). As of 'https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/' the very way LLMs work combined with their use is at the very least discriminatory: they can only work by having bias, and the less bias they have the less well they work. Though they can probably be used to detect bias. About: > The huge ecological footprint of genAI is well documented, [...] Personally I don't think they are. What is well documented is probably a known lower bound of that footprint. The footprint is probably bigger, and the details are also lacking. I've been trying to understand the footprint of the training of models that are shipped in Firefox (which have a very low runtime footprint), but I've not been able to find the information. I've only looked for a day or 2, and probably around 6 months or a year ago. All the information I found was on experimental models made by Mozilla that were not the models that really shipped in Firefox and even that was problematic (I don't recall exactly why though, but it was clearly out of reach for an individual on a computer like a ThinkPad X200). About: > At the time of writing, only proposed interpretations of copyright > law exist: This looks good but then we have a very broad statement: > that depending on the level of human intervention, genAI output > could be considered not copyrightable or at best “uncertain” in the > [European > Union](https://www.europarl.europa.eu/RegData/etudes/STUD/2025/774095/IUST_STU(2025)774095_EN.pdf) This might also depend on the details, and I don't how it shound't as LLMs can also print code that already exists somewhere else and "depending on the level of human intervention" doesn't seem to capture that. But then states have an interest in using LLMs in wars or for repression, so even if "it shound't", the future isn't set in any direction (states can also make exceptions for themselves, do illegal things without bothering about the law, and in practice states in give very limited protection to citizen against various part of the state agreeing to do things against its citizen). And the fact that this is legally uncharted territory is also problematic per-se: Free software adjusted to many laws and also adjusted many laws to it, or had strategies to workaround laws in other cases (patents, DMCA) as best as it could. And all is tied together, the very fact that Guix is a distribution in the same way Debian or Trisquel is, has different legal implications than pip has. Assuming that we do have legal clarity at some point, we would still need the laws and free software to be adjusted to fit our practices, to have lawyers from several trusted organizations boil it down to things we can understand and that are safe for most cases, etc. So having broad statement that implicitly apply to all LLM generated outputs like "could be considered not copyrightable" look strange to me. So I'd rather insist in the fact that this is uncharted and maybe give the same proof but don't imply anything. For instance: > At the time of writing, legally, this is uncharted territory as only > proposed interpretations of copyright law exist, but that in itself > is not enough. As an example the [European > Union](https://www.europarl.europa.eu/RegData/etudes/STUD/2025/774095/IUST_STU(2025)774095_EN.pdf) > doesn't have any idea if specific cases could be considered > non-copyrightable, or "unknown". Once we have more legal clarity in > the various jurisdictions around the world, we would also need to > see how to or not to adapt laws and free software to each others, to > have lawyers boild down the important actionable information for us, > to understand how that works in practice, etc, like we previously > had to do along the way with various laws before. As for: > This legal uncertainty is one reason for projects [such as > Gnulib](https://lists.gnu.org/archive/html/bug-gnulib/2026-02/msg00064.html) > to prohibit the inclusion of [“legally > significant”](https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html) > portions of code (more than 10 lines). So it is misleading and we shound't continue misleading people on that. Binutils made the same mistake but they are now aware of it, so I assume that this will be fixed if it's not already. I really like that part, because in addition to showing the link between the GCD and what Guix wants (safeguard its social fabric and its legality), it also shows exactly what Guix wants (social fabric + legal). > This document proposes the adoption of a pledge to safeguard our > production from a legal standpoint as well as the social fabric built > over almost 15 years around the project. > Instead, this proposal aims at setting a standard for what we do > collectively within the project. Conservancy published guidelines on LLMs, and they might be (re)used in some form by many projects, so we could try to somehow align with them and/or point to them. This could also save you a lot of work. The benefit is that the more together we are the easier it would be for everybody (contributors, reviewers, etc), this way it would look more as usual with this set of projects having more or less this set of rules, and these other group of projects having these other rules, etc, with occasional mixes and matches (like you can use Linux DCO in a GNU project/package I think). > Questioning the reasons that make genAI feel necessary for people > using Guix, and finding ways to fill the gap. This is too broad. Guix even has an AI team. Guix also has scientific software that runs on supercomputers, etc. If llama-cpp is legally okay and that it can be used with free software on lower end computers, for instance with small models that can easily be trained, I don't see why not to package it. And having llama-cpp for instance could be used precisely to prove that LLMs are currently not worth it, or some other properties about LLMs. Though llamma-cpp would be worth that proof in that case. However for software like vim that openly allows vibecoded contributions, I think it's another story as it puts redistributors of its source code and binaries at risk, which then make the redistributors and users share a common interest with the legal situation being clarified in a way that allows these contributions to stay. So I would try to insist on the software being redistributed by Guix and/or the gaps left there by not packaging vibecoded software (though I'm unsure that we'd have the majority for that, we might also need more research to understand the consequences of that). Here's an example: > Ensuring that the binary and sources substitutes as well as Guix > source code is produces without [genAI] and if possible > collaborating to find ways to fill the gap left by not packaging > software or data produced by LLMs because they didn't match Guix > requirements (free software, not taking too much space or time to > build, etc). This way it could allow extremely small LLMs if the training requirements are not bigger than for compiling Firefox, Libreoffice, etc. It would continue to allow llama-cpp if it's made by humans (assuming one can train a tiny model and run it), etc. And the result is clear: no software being packaged with vibecoded code in it, and collaboration (for instance packaging vim-classic, finding tips on the mailing list to not use this or that LLM). It is also narrow enough to focus on Guix and its packages to avoid broader societal issues that Guix is probably not well equipped to solve. > Strengthening support for craftspeople the project interacts > with—translators, artists, developers, and so on. How that would translate concretely? Is that wishful thinking or are there ways to show that this can have concrete results. > Contributing to the public debate on these matters and creating ties > with like-minded organizations and grassroots movements. I think we badly need to do that. As a distribution, we could also contribute to making sure upstream projects don't mess up too much in the exact same way we do when upstream ships nonfree software. For instance we could collaborate with rsync to have rsync label for us what files we should remove (like tests) in ~#begin in (source [...]) making it possible to safely redistribute software like rsync. Right now there is no bad consequences for an upstream of using vibe-coding, where down the road, if the legal situation turns bad in practice, everybody could have to pay the cost of these decisions. Many upstream are probably aware of the risks as well, so this could also gives them a safer way to deal with the consequences (like remove what we labeled as nonfree) if things turn bad, and forking (without the nonfree bits) is always possible if the upstream project legal troubles prevent it from continuing to operate. And if at some point things become okay for some reasons (there is more than just the legal side here, as the community is important as well), the ~#begin could be removed. The other way around (not caring about packages with code generated by LLMs) carry the risk of making the problem too big to solve if we wait too much, which would both set us in the camp of legalizing that, and if it's not legalized, let us deal with the consequences, which also includes heated debates I guess, which could split the community. > 1. The project (defined as maintainers, team members, and anyone with > write access to a Guix repository, including Weblate, or to Guix > resources such as the build farm) **will not use nor encourage use > of genAI** to author code or packages, to interact with other > participants (e.g., to explain code changes or to review code), to > produce artwork, translations, or any other artifact. This is not enough. We should actively help people to not use LLMs for that according to our resources of course, like we already do for nonfree software. This would also be consistent with concervancy guidelines. > 2. The project will keep working to **provide people of all levels > of experience with the resources to use Guix and to > contribute to Guix** without feeling the need to resort to > genAI: I would add a bullet point about adding strong arguments against the practices being forbidden. At the time of writing, this is also an essential point I think. This is also why I trying to convey why writing well this GCD is extremely important (and that's an understatement) because down the road we'll have to convince people to do "the right thing" from Guix's perspective. And this would strengthen the cohesion of Guix contributors and users, inspire other distributions to do the same, etc. In contrast a badly written justification would increase confrontation in every possible way (in discussion with this GCD, with relationship with upstrea, with online newspapers, etc) because people would just take sides and ignore the other side concerns. Also, not a lot of distributions are in Guix's position here and can really limit the damage of code generated by LLMs (many distribute nonfree software and don't have mechanism for dealing with vibe-coding in upstream source, or barely have have enough resources to remove nonfree software, etc). Down the road, many years later maybe everybody will hate LLMs, or maybe not, but right now we badly need good rationales. Not doing that would be inconsistent with Concervancy suggestions, or put too much burden on the individual maintainers, which at the end doesn't scale (and so it's unrealistic in practice to have each maintainer try to convince contributors in their own ways without a very good reference). > 2. **Contribution acceptance.** Contributions produced in whole or in > part by genAI MAY be accepted provided the changes are not > [“legally > > significant”](https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html), > to ensure the contributor has a valid copyright claim on the code. > As a rule of thumb, this includes code less than 15-line-long, or > package definitions that are evidently not creative, similar to > those that `guix import` and similar tools might produce. > GenAI-produced contributions that do not meet this criterion will > be rejected. This "legally significant" has been misunderstood GNU binutils as well, but they now know that, so I guess that it will be fixed if it hasn't already. Reference: the "LLMs and clarifications on < 15 lines and copyright" thread in gnu-prog-discuss. > 4. **Exploratory analysis.** Contributors are free to use genAI as > part of their exploratory process as long their final contribution > respects the above rules. For instance, use of genAI to identify > the cause of a bug or the reason for a package build failure is > permitted. It would be a good thing to ask about that on gnu-prog-discuss just to make sure it's not something that has been overlooked, and given the huge amount of mess that LLMs create, it's a non-zero probability. Once that cleared up, I think it makes sense, but I would stress that nothing in the final code should be generated by the LLM to make things clear. There might also be good practices to avoid duplicates here (I vaguely recalls mentions of process to do that in Linux on lwn.net but I don't recall the details). > What would be costly to revert is the *lack* of any form of regulation > on genAI use in Guix. Another thing that would be extremely costly is a policy that opens the door for LLM contributions (like the rules on 15 lines that is taken out of context) like this one which effectively makes reverting that part too costly. Plus that has the potential to increase the confrontation with GNU so that could be costly too. Denis.
pgpbG92EST8Dx.pgp
Description: OpenPGP digital signature
