Re: Version 2 of GCD 008 available for review!

Denis 'GNUtoo' Carikli Thu, 25 Jun 2026 21:11:17 -0700

Hi,

On Sun, 21 Jun 2026 19:23:39 +0200
Ludovic Courtès <[email protected]> wrote:


The structure of the GCD looks much better than last time, more
specifically I really like the "many in our community believe [genAI]
has an impact that undermines the social foundations of free software
and Guix" and "This document proposes the adoption of a pledge to
safeguard our production from a legal standpoint as well as the social
fabric built over almost 15 years around the project." as this really
shows the link between the broader issues and Guix.

Else with only "X is immoral therefor we should do Y", you could
probably justify anything you want reguardless of X and Y with the
proper propaganda, by shouting the loudest, etc, and that's what power
structures do all the time.

Though the exception about 15 lines that is taken out of context is a
complete no-go for me.

How is that possible when you participated to the "LLMs and
clarifications on < 15 lines and copyright" thread?

Did you forget because there is way too much things going on and that
the discussions are heated and so on and that we needed a decision
yesterday and that patches generated by LLMs are still going in Guix
right now?

In that case maybe we could make the GCD way smaller for now and just
put the smallest/easiest/most-miniimal stop-gap (no code/data
generated by LLMs gets in Guix until we get more legal clarity from
lawyers, obviously without the exceptions for code under 15 lines).

And then, once this is done, we could try to converge to some
agreement on the rest of this GCD.

Concervancy has some guidelines for dealing with LLMs, and reusing
them somehow could make reaching consensus easier.

Reference: https://h.net/Articles/1078521/#Comments

We could also start tagging upstream source that has code generated by
LLMs somehow, and make it possible to build it without
(--without-llm-generated-content as a package transformation) to get
more feedback on the consequences of not allowing LLM generated
content in substitutes as well.

Note that in practice allowing code generated by LLMs < 15 lines would
also require Guix to go against GNU and LLMs are already
controversial and we need a stop gap right now.

So I assume that having a crisis right now, in the middle of decisions
that are already controversial, and urgent, and in a period that is
close to the holidays for most people, or under heat waves (in my
case), is probably not the best idea.

Note that GNU Boot also depends on Guix here. So it would be really
messy.

I've also other comments below (rewrite of paragraphs, better
rationale, etc), but they might only be relevant later on if you want to
send a v3 or address the broader context once a minimalist stopgap
passed.

----------------------------------------------------------------------------

The indentation of this paragraph looks strange in the .md file:
> However impressive the results may look, many in our community
> believe genAI has an impact that undermines the social
> foundations of free software and Guix:

As things evolve, I would try to insist that this is current as things
can evolve (I'm thinking about LLMs not made by big companies here, so
the pace of evolution and the direction it takes will probably not be
the same, but I would rather err on the side of caution, especially
given how much resources we need to modify a GCD). So "currently"
could be added:

> However impressive the results may look, many in our community
> believe genAI has an impact that currently undermines the social
> foundations of free software and Guix:

For instance, if I understood right, In the 60's there were protest
against computers, because of the power imbalance they had (they
costed a lot and their owners could run a database on it (nowadays
CRMs and similar technologies are frightening for instance).

This created a huge power imbalance at the time, and in some cases it
still is (surveillance capitalism), but then computers were also used
historically to fight opression, like with Operation Vula, or by
freeing activists from activities like stuffing envelopes just to
communicate.

About:
>   - GenAI launders the reciprocity baked into copyleft licenses such
> as the GNU General Public License (GPL), effectively violating it.  A
>       real-world example of copyleft-laundering is [the `chardet`
>       LLM-assisted “rewrite” for the stated purpose of relicensing
> from LGPL to MIT/Expat in March
>       
> 2026](https://tuananh.net/2026/03/05/relicensing-with-ai-assisted-rewrite/)
>       ([covered by LWN](https://lwn.net/Articles/1061534/)) or the
> [EmDash WordPress reimplentation in TypeScript “under the more
> permissive MIT
> license”](https://blog.cloudflare.com/emdash-wordpress/).

Here I don't agree, as this has to be settled in court to really know
it is the case, and you make the point later on with Chardet again.

So this could be rewritten as "GenAI tries to launders [...]
emdash-wordpress/), and at the time of writing this isn't settled by
courts yet." but I think it's not good enough because there is also a
power imbalance against copyleft here.

And I think it is very important to note as well as this does affects
our strategies a lot.

Things can also evolve, in the past we had patent trolls, copyfraud,
copyright trolls, etc... And not everything is well-known (did you
know that OpenMoko was threatened by a company on mp3 patents for
instance).

And if I understood well, now more than ever, free software is a
threat to some big corporations that are also involved in LLMs, and as
I understand this played in Google trying to shut down F-Droid, and
many other attempts like the age verification laws that were tailored
specifically for companies making nonfree software and/or the
surveillance capitalism business.

References:
https://f-droid.org/en/2026/02/24/open-letter-opposing-developer-verification.html
https://agelesslinux.org/lobbyists.html

> We believe it [stifles individual
> autonomy](https://ali-alkhatib.com/blog/defining-ai) at a fundamental
> level—replacing one’s ability to build up knowledge with a false
> sense of quick achievement, building up [cognitive
> debt](https://simonwillison.net/2026/Feb/15/cognitive-debt/)—while also
> [weakening communities and destroying labor
> power](https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/).

I think the bigger picture also need to be taken into account here. Two
things comes in mind: (1) LLMs are a tool, and (2) LLMs are at the
moment inscrutable.

(1) The consequence of (1) is that the tool embedded knowledge, and so
    if the tool is nonfree it practically deprive humans that use the
    tool of the knowledge that is embedded in the tool.

(2) If we contrast (2) with programming languages or other free
    software automation tools, the vast majority of programming
    languages were made to be understood by humans, and some were made
    specifically to be understood by non-programmers (like FLOW-MATIC
    from Grace Hopper) which then influenced other languages like
    COBOL or Python ('or' instead of '||' is an example of
    that). People can also understand how graphical automation tool
    work if they're free software, documented, etc.

Both (1) and (2) combined makes free software tools acceptable because
they can empower people, and they also give people freedom to modify
them, etc, and in case of LLMs these freedoms are currently denied to
individuals, whereas individuals can still manage to get these
freedoms even with very big software like Linux, Libreoffice, Firefox,
etc, even with low end computers (it is possible to compile Linux on
very low end computers, Firefox and Libreoffice is more challenging
but it's probably possible).

As of 'https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/' the very
way LLMs work combined with their use is at the very least
discriminatory: they can only work by having bias, and the less bias
they have the less well they work. Though they can probably be used to
detect bias.

About:

> The huge ecological footprint of genAI is well documented, [...]

Personally I don't think they are. What is well documented is probably
a known lower bound of that footprint. The footprint is probably
bigger, and the details are also lacking.

I've been trying to understand the footprint of the training of models
that are shipped in Firefox (which have a very low runtime footprint),
but I've not been able to find the information. I've only looked for a
day or 2, and probably around 6 months or a year ago.

All the information I found was on experimental models made by Mozilla
that were not the models that really shipped in Firefox and even that
was problematic (I don't recall exactly why though, but it was clearly
out of reach for an individual on a computer like a ThinkPad X200).

About:

> At the time of writing, only proposed interpretations of copyright
> law exist:
This looks good but then we have a very broad statement:

> that depending on the level of human intervention, genAI output
> could be considered not copyrightable or at best “uncertain” in the
> [European
> Union](https://www.europarl.europa.eu/RegData/etudes/STUD/2025/774095/IUST_STU(2025)774095_EN.pdf)

This might also depend on the details, and I don't how it shound't as
LLMs can also print code that already exists somewhere else and
"depending on the level of human intervention" doesn't seem to capture
that.

But then states have an interest in using LLMs in wars or for
repression, so even if "it shound't", the future isn't set in any
direction (states can also make exceptions for themselves, do illegal
things without bothering about the law, and in practice states in give
very limited protection to citizen against various part of the state
agreeing to do things against its citizen).

And the fact that this is legally uncharted territory is also
problematic per-se: Free software adjusted to many laws and also
adjusted many laws to it, or had strategies to workaround laws in
other cases (patents, DMCA) as best as it could.

And all is tied together, the very fact that Guix is a distribution in
the same way Debian or Trisquel is, has different legal implications
than pip has.

Assuming that we do have legal clarity at some point, we would still
need the laws and free software to be adjusted to fit our practices,
to have lawyers from several trusted organizations boil it down to
things we can understand and that are safe for most cases, etc.

So having broad statement that implicitly apply to all LLM generated
outputs like "could be considered not copyrightable" look strange to
me.

So I'd rather insist in the fact that this is uncharted and maybe give
the same proof but don't imply anything. For instance:

> At the time of writing, legally, this is uncharted territory as only
> proposed interpretations of copyright law exist, but that in itself
> is not enough. As an example the [European
> Union](https://www.europarl.europa.eu/RegData/etudes/STUD/2025/774095/IUST_STU(2025)774095_EN.pdf)
> doesn't have any idea if specific cases could be considered
> non-copyrightable, or "unknown". Once we have more legal clarity in
> the various jurisdictions around the world, we would also need to
> see how to or not to adapt laws and free software to each others, to
> have lawyers boild down the important actionable information for us,
> to understand how that works in practice, etc, like we previously
> had to do along the way with various laws before.

As for:
> This legal uncertainty is one reason for projects [such as
> Gnulib](https://lists.gnu.org/archive/html/bug-gnulib/2026-02/msg00064.html)
> to prohibit the inclusion of [“legally
> significant”](https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html)
> portions of code (more than 10 lines).

So it is misleading and we shound't continue misleading people on
that. Binutils made the same mistake but they are now aware of it, so I
assume that this will be fixed if it's not already.

I really like that part, because in addition to showing the link between
the GCD and what Guix wants (safeguard its social fabric and its
legality), it also shows exactly what Guix wants (social fabric +
legal).

> This document proposes the adoption of a pledge to safeguard our
> production from a legal standpoint as well as the social fabric built
> over almost 15 years around the project.

> Instead, this proposal aims at setting a standard for what we do
> collectively within the project.

Conservancy published guidelines on LLMs, and they might be (re)used
in some form by many projects, so we could try to somehow align with
them and/or point to them. This could also save you a lot of work.

The benefit is that the more together we are the easier it would be
for everybody (contributors, reviewers, etc), this way it would look
more as usual with this set of projects having more or less this set
of rules, and these other group of projects having these other rules,
etc, with occasional mixes and matches (like you can use Linux DCO in
a GNU project/package I think).

> Questioning the reasons that make genAI feel necessary for people
> using Guix, and finding ways to fill the gap.

This is too broad. Guix even has an AI team. Guix also has scientific
software that runs on supercomputers, etc.

If llama-cpp is legally okay and that it can be used with free
software on lower end computers, for instance with small models that
can easily be trained, I don't see why not to package it.

And having llama-cpp for instance could be used precisely to prove
that LLMs are currently not worth it, or some other properties about
LLMs. Though llamma-cpp would be worth that proof in that case.

However for software like vim that openly allows vibecoded
contributions, I think it's another story as it puts redistributors of
its source code and binaries at risk, which then make the
redistributors and users share a common interest with the legal
situation being clarified in a way that allows these contributions to
stay.

So I would try to insist on the software being redistributed by Guix
and/or the gaps left there by not packaging vibecoded software (though
I'm unsure that we'd have the majority for that, we might also need more
research to understand the consequences of that).

Here's an example:

> Ensuring that the binary and sources substitutes as well as Guix
> source code is produces without [genAI] and if possible
> collaborating to find ways to fill the gap left by not packaging
> software or data produced by LLMs because they didn't match Guix
> requirements (free software, not taking too much space or time to
> build, etc).

This way it could allow extremely small LLMs if the training
requirements are not bigger than for compiling Firefox, Libreoffice,
etc. It would continue to allow llama-cpp if it's made by humans
(assuming one can train a tiny model and run it), etc.

And the result is clear: no software being packaged with vibecoded
code in it, and collaboration (for instance packaging vim-classic,
finding tips on the mailing list to not use this or that LLM).

It is also narrow enough to focus on Guix and its packages to avoid
broader societal issues that Guix is probably not well equipped to
solve.

> Strengthening support for craftspeople the project interacts
> with—translators, artists, developers, and so on.

How that would translate concretely? Is that wishful thinking or are
there ways to show that this can have concrete results.

> Contributing to the public debate on these matters and creating ties
> with like-minded organizations and grassroots movements.

I think we badly need to do that. As a distribution, we could also
contribute to making sure upstream projects don't mess up too much in
the exact same way we do when upstream ships nonfree software.

For instance we could collaborate with rsync to have rsync label for
us what files we should remove (like tests) in ~#begin in (source
[...]) making it possible to safely redistribute software like rsync.

Right now there is no bad consequences for an upstream of using
vibe-coding, where down the road, if the legal situation turns bad in
practice, everybody could have to pay the cost of these decisions.

Many upstream are probably aware of the risks as well, so this could
also gives them a safer way to deal with the consequences (like remove
what we labeled as nonfree) if things turn bad, and forking (without
the nonfree bits) is always possible if the upstream project legal
troubles prevent it from continuing to operate.

And if at some point things become okay for some reasons (there is
more than just the legal side here, as the community is important as
well), the ~#begin could be removed.

The other way around (not caring about packages with code generated by
LLMs) carry the risk of making the problem too big to solve if we wait
too much, which would both set us in the camp of legalizing that, and
if it's not legalized, let us deal with the consequences, which also
includes heated debates I guess, which could split the community.

> 1. The project (defined as maintainers, team members, and anyone with
>    write access to a Guix repository, including Weblate, or to Guix
>    resources such as the build farm) **will not use nor encourage use
>    of genAI** to author code or packages, to interact with other
>    participants (e.g., to explain code changes or to review code), to
>    produce artwork, translations, or any other artifact.

This is not enough. We should actively help people to not use LLMs for
that according to our resources of course, like we already do for
nonfree software. This would also be consistent with concervancy
guidelines.


> 2. The project will keep working to **provide people of all levels
>        of experience with the resources to use Guix and to
>        contribute to Guix** without feeling the need to resort to
>        genAI:

I would add a bullet point about adding strong arguments against the
practices being forbidden. At the time of writing, this is also an
essential point I think.

This is also why I trying to convey why writing well this GCD is
extremely important (and that's an understatement) because down the
road we'll have to convince people to do "the right thing" from Guix's
perspective.

And this would strengthen the cohesion of Guix contributors and users,
inspire other distributions to do the same, etc.

In contrast a badly written justification would increase confrontation
in every possible way (in discussion with this GCD, with relationship
with upstrea, with online newspapers, etc) because people would just
take sides and ignore the other side concerns.

Also, not a lot of distributions are in Guix's position here and can
really limit the damage of code generated by LLMs (many distribute
nonfree software and don't have mechanism for dealing with vibe-coding
in upstream source, or barely have have enough resources to remove
nonfree software, etc).

Down the road, many years later maybe everybody will hate LLMs, or
maybe not, but right now we badly need good rationales.

Not doing that would be inconsistent with Concervancy suggestions, or
put too much burden on the individual maintainers, which at the end
doesn't scale (and so it's unrealistic in practice to have each
maintainer try to convince contributors in their own ways without a
very good reference).

> 2. **Contribution acceptance.**  Contributions produced in whole or in
>    part by genAI MAY be accepted provided the changes are not
>    [“legally
>    
> significant”](https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html),
>    to ensure the contributor has a valid copyright claim on the code.
>    As a rule of thumb, this includes code less than 15-line-long, or
>    package definitions that are evidently not creative, similar to
>    those that `guix import` and similar tools might produce.
>    GenAI-produced contributions that do not meet this criterion will
>    be rejected.

This "legally significant" has been misunderstood GNU binutils as
well, but they now know that, so I guess that it will be fixed if it
hasn't already.

Reference: the "LLMs and clarifications on < 15 lines and copyright"
thread in gnu-prog-discuss.

  > 4. **Exploratory analysis.** Contributors are free to use genAI as
  >    part of their exploratory process as long their final
  contribution >    respects the above rules.  For instance, use of
  genAI to identify >    the cause of a bug or the reason for a package
  build failure is >    permitted.

It would be a good thing to ask about that on gnu-prog-discuss just to
make sure it's not something that has been overlooked, and given the
huge amount of mess that LLMs create, it's a non-zero probability.

Once that cleared up, I think it makes sense, but I would stress that
nothing in the final code should be generated by the LLM to make
things clear.

There might also be good practices to avoid duplicates here (I vaguely
recalls mentions of process to do that in Linux on lwn.net but I don't
recall the details).

> What would be costly to revert is the *lack* of any form of regulation
> on genAI use in Guix.

Another thing that would be extremely costly is a policy that opens
the door for LLM contributions (like the rules on 15 lines that is
taken out of context) like this one which effectively makes reverting
that part too costly. Plus that has the potential to increase the
confrontation with GNU so that could be costly too.

Denis.

pgpbG92EST8Dx.pgp
Description: OpenPGP digital signature

Re: Version 2 of GCD 008 available for review!

Reply via email to