[Discussion] AI-generated content within Debian

Pierre-Elliott Bécue Wed, 18 Feb 2026 16:19:35 -0800

(I wrote this markdown style, and I'm too lazy to convert it to text)

# The infamous LLM discussion


So, I'm starting this discussion publicly because a heated discussion
started privately, and this is no private topic. The discussion started
because of the new DFSG team's NEW queue website, which has been (to
some extent I don't personally know) developed with the assistance of an
agentic coding tool.

I'd like to summarize where we all collectively are, where Debian is
currently, and the different pros/cons/arguments I read and heard in the
past two years. This obviously won't be exhaustive, it's a starting
point.

This is not an opinionated post, I am in an uncomfortable cognitive
dissonance on the matter, so it's rather a snapshot of my brain on the
topic.

To be frank, I personally don't know where I stand. I think I'm neither
for nor against AI-generated code, but I am aware that currently, it's
not possible to give a simple and trivial ruling. If some specific
questions worth an answer are asked, I'll reply, but otherwise I have
the very intent to not post after this mail. The topic, its
ethics/sociologic/technologic ramification is exhausting, and I'd rather
spend my time doing funny stuff.

I might at some point in this text (I'm writing it linearly so I don't
know how much time I'll take to write it and what the end will look
like) offer an idea of a policy on the matter. But don't expect from me
to say if it's a good idea or not.

I do ask everybody, DDs, DMs, DCs, bystanders, to refrain from flaming.
I know this wish has little chances to be successful, but at least I
will have tried.

## Kind of an intro

TL;DR: AI exists and is used everywhere already, and now it hits the
project, some are for, some are against, you can go directly to "The
brainstorm of pros and…"

### AI

AI is for Artificial Intelligence, which means pretty much everything
and nothing. A bayesian filter properly trained is AI, your 0ad virtual
opponent is AI, a fine-tuned Chess algorithm is AI, and an LLM is AI.
For most of us, AI means something that mimics intelligence without
being intelligent itself. But what is "intelligence"? Well, a nice
definition I read in a dictionary is **the ability to know, learn,
understand and adapt easily**. It's vague, but from there one can expand
and explain that intelligence can be "gathering and interconnecting
facts efficiently", "the ability to deduce from partial information",
etc.

We all feel that we understand what intelligence is, and that it should
be only applicable to humans and animals, but truth is, if intelligence
matches the definition I wrote above, then "artificial intelligence"
fits the term.

This is the first source of friction. We all have our own view on what
AI is, and in a room of 100 persons, we could potentially get 100
different definitions that might clash on some aspects. One thing on
which we might all agree is that AI didn't start with the release of GPT
(the model) and ChatGPT (the tool), and won't stop there. Another thing
we will all have to accept, whether we like it or not, is that AI won't
disappear from our lives.

### Where the world is right now

Here, I'd like to emphasise that this is my view of the current
situation. I'm neither an economist nor an expert, I have no share in
any company, I'm not shorting nVidia, and to be fair, on these aspects,
I've chosen my path, which is taking things as they come, and trying to
sort out the garbage from the good stuff.

The IT world has, every year since the beginning of the internet era,
had a hype train on whatever technology. We all remember when "cloud"
was the buzzword, or when "cryptocurrency" or "NFT" became the next one.
Some trends died out, some are still around. AI is the latest one, and
it seems that it's the same order of magnitude that the cloud or the
internet have been, maybe bigger. The main reason, and it's essential to
acknowledge it, is that it made some activities far easier, less
tedious, and strictly speaking, allowed many humans to focus on things
they prefer to do. Also, it allows some people with a lot of creativity
in some fields but the lack of expertise to be able to start trying to
achieve actual stuff there (coding, video, music, etc).

I've spent the past year seeing posts on LinkedIn and Twitter about
people having no development skills being happy to be able to try either
learning with an AI as a teacher, or vibe-coding SaaS apps. Whether we
like the idea of newbies being able to cargo-cult apps or not, we can't
deny that this created a huge leverage for productivity.

As usual in these kinds of situations, some companies are trying to get
money out of the hype, and for this, the CEOs and their friends do
overselling. This is currently where we are. Be that Jensen Huang, who
really needs to sell more GPUs, Sam Altman who is probably seeing the
winds changing (hello Microsoft taking a step back[4]), Oracle which
will probably die if OpenAI falls, or Anthropic's CEO who has repeatedly
predicted over the past year that AI would handle all coding within the
next 12 months[1][2][3] (in his defense, he's not the only one),
everybody goes with their take. It makes them visible (hey, they need to
sell), and also it's the american style "fake it until you make it".

Let's be frank, this is at least reckless, and probably dishonest.
Speaking for myself, it raises concerns, but also disgusts me. It makes
the market volatile, unreadable, it destabilizes big chunks of the
economy, it wrecks plenty of markets (hello RAM shortages, hello
production shifting between mass market and AI-dedicated market, hello
floating point precision reduction on latest architectures, …).

And, even though the claims on water consumption are debatable
(depending on how the datacenters are architected), there is no doubt
that in some countries (eg the USA), it creates a strong strain on water
consumption, not mentioning the water needed to manufacture the chips.
Furthermore it creates a lot of drain on energy consumption. In
countries with clean energy, the main bad effect is that it creates more
stress on networks, but in countries running on oil, natural gas or
coal, this is potentially disastrous.

All in all, the picture is as usual with technology leaps, there are
great outcomes, good opportunities, but also strong drawbacks. It makes
the topic as much a political topic as all previous big changes the
world has faced during the last two centuries (industrial era, tractors
for agriculture, Internet, etc).

To those who oppose LLMs or coding agents on ecological grounds, I'd
remind them that Debian and many FOSS projects rely on the Internet
being the way it is, and this had and still has a very strong ecological
impact, that they seem to be able to live with.

Going from this global picture, let's try to envision what's the current
situation for Debian (this probably applies to FOSS in general)

[1] 
https://www.entrepreneur.com/business-news/anthropic-ceo-predicts-ai-will-take-over-coding-in-12-months/488533
[2] https://www.darioamodei.com/essay/the-adolescence-of-technology
[3] 
https://www.businessinsider.com/google-deepmind-anthropic-ceos-ai-junior-roles-hiring-davos-2026-1
[4] 
https://www.windowscentral.com/artificial-intelligence/microsoft-confirms-plan-to-ditch-openai-as-the-chatgpt-firm-continues-to-beg-big-tech-for-cash

### AI in Debian (/FOSS)

Let's not lie to ourselves. In the past two years, we saw changes. Some
people started discussions about AI, the discussions were not simple,
and we saw that, as usual with such strong changes, reaching consensus
is either impossible or at least not really easy. In parallel, some
software we provide probably saw changes directly written by coding AI,
and a lot of mails have been written or reviewed (or a bit of both) by
an LLM.

In the areas of the project I'm involved in, we have had multiple DD
applicants who sent LLM-generated content for their AM step. This
usually had negative consequences on their application, but maybe some
applicants were savvy enough to alter the text enough to not be visible.
The main concern I have on this specific case is that they don't really
learn and might resort to LLM every time they have a question. There,
the productivity for the project becomes catastrophic, because they will
use far more resources than what they would do if they were to actually
try learning and remembering. This could be extrapolated to any other
field. While AI tends to make people more productive, it seems to only
work to the extent that those using it do actually learn something.

In FOSS in general, we have seen enough cases (eg [5][6]) to know that
we probably already let code written by an AI to be committed, or bug
reports submitted without any real reading from the author who simply
copy/pasted the output from an AI agent.

That being said, on a more personal side, I always write my mails
myself, and I tend to go with the flow of my mind. When writing in
English (not my mothertongue, so I make mistakes), or when writing
loaded mails, I try to reread myself, but also to ask relatives to do
the review, but sometimes I have nobody around. When I'm convinced an
external review is needed, I tend to default to asking to ChatGPT or
Claude if the content has no personal data or no strategic corporate
data. I'm not very proud of it, but I'm not really ashamed either. In a
perfect world, I'd like to get an inferential model tiny enough to run
on my dedicated server to be able to minimize the consumption and
potential leaks, but so far my tests were not really satisfying with
these, and I failed to have enough time to tweak and test. Recently, I
tried to send all my mails without a review by an AI, but this specific
text was AI-checked by Claude for the English, to ensure that I don't
say something that's not consistent with my intent. To be clear, I wrote
all paragraphs, and didn't use the LLM in any other way than "English
checking and intent-checking", but for some purists in the room, this
might make my mail worthless.

Now, as I said above, we realize that some bits of the infra do at least
contain parts of AI-generated code. We don't know to what extent the
code has been reviewed/modified, and necessarily it creates frustration
and legitimate questions.

Some in the project want to purely and simply forbid any project
contributor to use any AI-generated content to achieve their work within
the project (be that website coding, app design, "debian-dir" generation
for packaging, translation, etc). Some, on the contrary, seem to
consider that AI is a real progress and will benefit all of us, and
that, anyway, FOSS is dead without AI. I can't and won't quote these
mails because they were sent privately.

In the middle, some are rather concerned by legal aspects or ethical
aspects.

After this very long and nonetheless partial intro, I'd like to try
summarizing the things that seemed, to me, relevantly pointed, be they
against or pro AI generated content.

[5] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
[6] https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty

## The brainstorm of pros and cons when it comes to LLM and agentic
   coding

This part is as I wrote, a brainstorm, each subpart will mention one of
the different axes we need to grind before thinking about what we want
to do. Sorry, it might be a bit messy.

I tried to grind some figures and source the things I'll state, but
please take it with a pinch of salt, I'm no expert, and didn't want to
spend 12 unpaid hours on each topic, especially with an average of 6
hours of nights since early december.

### The ecological aspect

As I mentioned, we know that AI comes with a big ecological aspect, as
did the bitcoin, the Internet, and the industrial era. But one can't use
these as a shield to ignore the specific issue the AI poses :
what-aboutism is not an argument *per se*.

#### Electricity

According to [7][8] AI consumes between 10 and 20% of the datacenter
consumption in the world. This DC consumption is about 1.5 to 2% of the
global electric consumption. It means that, worst case scenario, AI
represents 0.4% of the world electric consumption. This is not huge, but
this is big (as in more than 100 TWh, roughly the consumption of the
Netherlands). And there is a huge discrepancy between the
countries/states in the world[10] (eg 21% of Ireland's electricity is
eaten by DC, and 26% in Virginia, US).

IEA predicts that DC consumption could be double in 2030, and the MIT
Technology Review estimates that in 2028, AI could eat more than 50% of
the DC electricity consumption.

On the pollution aspect, DC CO2 emissions could be as much as 1% of the
total CO2 emissions in 2030[10]

We also could mention the strain this induces on some already ancient or
limited physical network, which induce the need for new infrastructures,
etc (this could, in the long term, become a problem if electric network
doesn't follow AI demand, limiting what datacentres can do, or forcing
public authorities to choose between different industries)

[7] https://www.allaboutai.com/resources/ai-statistics/ai-environment/
[8] https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai
[9] 
https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/
[10] 
https://www.carbonbrief.org/ai-five-charts-that-put-data-centre-energy-use-and-emissions-into-context/

#### Water

According to IEEE[11], in 2023, US' DCs were consuming 17.5 billions
gallons of water, which is around .3% of the public water supply in the
US, this doesn't account for electricity production, of which the part
for DCs includes a staggering 211 billions of gallons consumed (see also
[12]).

These amounts could, according to IEEE increase two- to four-fold in
2028[11].

In the US most DCs use cooling towers, which involves water evaporation.
In places where the water is already a limited resource, this creates
additional strain.

Some DCs are using closed-circuit cooling, which reduces the problem,
but they still require some water to be taken from the environment.

[11] https://spectrum.ieee.org/ai-water-usage
[12] https://www.eesi.org/articles/view/data-centers-and-water-consumption

#### But hey, it's not just AI

As I wrote above, while AI is a significant chunk of the digital
consumption, it's not all of it, and as of today, the digital already
uses between 3 and 5% of the global electricity production, with current
growth around 12%. AI is booming, but the problem was already there and
will still be there, even if AI was not.

We can surely be worried that AI's chunk seems to increase and will
likely increase faster than the rest of the digital consumption, but the
problem is that digital structurally has a big ecological impact. How
are we supposed to draw the line? Is publishing videos on Youtube ok? Is
posting on Bluesky ok? Can I put my kid in front of the TV one hour a
week to watch Bluey?

I know these questions could be perceived as a way to dodge the argument
by pushing exaggerated whataboutist questions. What I'm trying to
picture here is while it's relevant to question each specific new usage,
the current IT footprint is far bigger. Singling out AI is
intellectually inconsistent if we don't accept to sit down and try to
think a bit more globally

Also, the problem is essentially political, and the question we, as a
civilization, should ask ourselves is "what ecological impact do we
accept, and for what benefit?". And this question should be asked for
every big social topic that has an ecological impact (public
transportation, industry, agriculture, air travel).

### Legal/licensing aspects

One of the main questions I had to myself is the legal and licensing aspect.

#### The U.S. Case

In some other discussion, it was mentioned that the U.S. Congress had
taken a position on this. In fact, it's the Congressional Research
Service that issued a document for the benefit of Congress members (the
Congress has not produced any legislation on AI production and mixed
contents).

The CRS produced this note[16] based on guidelines and decisions of the
Copyright Office[13][14]. The USCO actually has a dedicated AI hub[15]
with an additional preprint.

The gatherings from these documents is that, currently, in the U.S.,
AI-generated code is not eligible for copyright, as the USCO only
recognizes copyright for human production. This means that without the
ability to identify very precisely what parts of the production are
AI-generated, the whole production (eg software) could be
uncopyrightable. And even if the bits are clearly identified, this has
direct implications when one wants to license their code, as the way
some FOSS licenses work don't allow for bits of the software to be
unlicensed.

Let's take GPL's example. GPL is what some external people call as
"contaminating". Essentially, if one wants to add AI-generated
contribution to a GPL-licensed software, then these additions must also
be licensed under GPL, which is not possible in the U.S!

The CRS note concluded in particular that being the prompter does not
make one the author, as being the author requires significant creativity
and appropriation of the production

This tends to mean that only AI generated content that has been
significantly modified by the human could be deemed copyrightable.

The latest part that matters is that the USCO concluded that it's not
possible to evaluate whether the use of protected content to train
models can be deemed "fair use or not".

[13] https://www.copyright.gov/ai/ai_policy_guidance.pdf
[14] 
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf
[15] https://www.copyright.gov/ai/
[16] https://www.congress.gov/crs-product/LSB10922

#### The U.S. Case takeaways

From the sole U.S. example, we can infer that copyright aspects are, at
best chaotic. If Debian starts delivering on its own platform
AI-generated content, then this content is currently not copyrightable
in the U.S., where Debian is widely used. This led to some discussions,
eg elfutils[17], where the project simply decided to reject any
LLM-generated content in the contributions.

This means that, best case scenario, if the project decides to accept
AI-derived contributions, these contributions could only be indirect
(either a human would need to modify these or integrate these in their
own way, or they should be used as a leverage to actually achieve the
production itself).

[17] https://www.mail-archive.com/[email protected]/msg08882.html

#### And it's just for the U.S.

I focused this part on the U.S. situation, but the things are not
simpler in, eg, Europe. Let's cite some examples

 - For training: the EU AI Act allows by default the use of copyrighted
   content, except if the author explicitly opted out of the
   possibility[18]. Model providers must in return provide a
   sufficiently detailed summary of the content they used to train their
   model, and write a policy about copyright compliance (but until 2024,
   it was *Free Lunchware*);
 - For output, it seems that pure AI generated content is not
   copyrightable, same way as the U.S. - the content must be "human
   enough"[19].

(I'll note that this makes the EU particularly not-competitive on the AI
field, even though we still manage to produce some things - hello
Mistral¹).

[18] https://iapp.org/news/a/the-eu-ai-act-and-copyrights-compliance
[19] https://www.europarl.europa.eu/thinktank/en/document/EPRS_BRI(2025)782585
¹ I hear in my earpiece that Mistral complained about European regulations?

### Consequences of the above: traceability, security, accountability

So, we saw that licensing is a can of worms (Claude suggested a
minefield, pick your favourite comparison). Now, let's look at things
from Debian's perspective.

Let's assume we managed to write an AI policy we're proud of, something
that accepts that the world changes, but tries to put a focus on
licensing respect, ethics, etc. Even then, we're left with at least
three intertwined questions for which I am unsure I have any relevant
answer. All of these are classic cybersec questions.

The first main issue is to know who to yell at^W^W^W^Wwhere it comes
from. Who is the author? How much of the code was actually written by a
human? Did the contributor just use AI as a reviewer (as I did for this
mail), did they ask it to produce code they then rewrote, edited and
audited, or did they just prompt and copy the output? If we can't tell,
we can't assess the licensing status of what we ship, and worse, we
can't assess whether we can give any trust to the shipped content.

The second issue is with security of the code. AI-generated code tends
to introduce (sometimes subtle) vulnerabilities, eg injections, poor
memory handling, phantom dependencies. A competent human can catch these
in review, but if neither the author nor the reviewer actually
understands the code, we're shipping a black box with potentially big
holes in it. Do we prefer insecure code written and pushed by humans, or
insecure code pushed by Anthropic? (I KNOW, we prefer NEITHER.) But the
question matters: when do we declare that we've lost control over the
code we ship, and what do we tell our users?

Then there is accountability. I know the first question already
contained some "who" in it, but it was merely to assess where it comes
from. Now the other part is what can we do if the thing explodes in our
hands. Sure, we could say that if someone pushes code as their own work,
they're responsible for it. It's sensible. But in practice, what will we
do when this happens? We won't sue the model provider, but will we feel
fine throwing it all on the person having AI-generated the code?

None of this is new. In August, ZDNet published an article[20] about AI
being used within the Linux Kernel community, referring to a thread[21]
that discusses these very auditability and accountability issues. The
kernel community eventually adopted a policy[22]. If the kernel
community felt the need for one, I would say one for Debian is probably
long overdue.

[20] 
https://www.zdnet.com/article/ai-is-creeping-into-the-linux-kernel-and-official-policy-is-needed-asap/
[21] 
https://lore.kernel.org/ksummit/[email protected]/
[22] 
https://lore.kernel.org/ksummit/[email protected]/

### Dependence to private actors and ethical concerns

So this one is probably more of a philosophical train of thoughts, but
it matters, too. And I guess especially for those in favour of
AI-generated code, it's worth reading.

Our baseline for being all here is that we love FOSS.

The thing is, currently, most performant models for coding are
cloud-provided and closed. Therefore, some of us seem to be eager to
depend on these proprietary tools to write actual FOSS. I know some of
us do use Windows, or play video games. I'm not trying to frame anyone
as hypocrites, we all try to reconcile our different needs and hobbies.

But I wonder, is it sane to run claude code on your Debian laptop on
which some of you might have a private PGP key hanging? Is it sane to
promote FOSS and not try to deploy a platform relying on FOSS models (eg
Deepseek Coder, Llama, Devstral 2) that would be able to write code? Is
it sane, especially considering that the output of these private actors
is mostly not copyrightable?

These questions echo the consistency arguments (far!) above in the sense
that we need to place a cursor (pun!) somewhere about what we accept and
at what cost. I think if and when the time to choose a policy comes,
these questions should be in our heads, in particular because ethically,
using these tools implies endorsing their unfair use of a lot of
protected content².

Maybe part of this philosophical point is to consider whether we want
"the best tool", or do we accept things to be a bit harder and try to
recommend using "the most ethical tools".

² this reminds me of a funny discussion with an extreme libertarian
acquaintance of mine who explained to me the good these big AI companies
were doing to the world until I asked him how he reconciled his
admiration with the fact that these companies only exist because they
trampled on the intellectual property of millions by training their
models with zero respect for copyrights. After all, the right to
property is the cornerstone of libertarianism, isn't it?

### Socio-economical aspects

AI tooling is currently deemed to boost productivity. There is partly
hype pushed by big AI sellers, but there is also truth to it:
individuals with expertise currently manage to produce more and faster
with these tools. The main drawback is that inexperienced people produce
garbage without knowing it, and that people tend to more easily kick off
irrelevant projects just because they can.

For Debian specifically, there are two concrete risks. First,
well-meaning contributors deploying AI-powered tools or workflows that
generate more code, more packages, more everything, while actually
adding legacy and strain on our infrastructure and collective review
capacity.

Second, a flood of low-quality contributions from people who prompted
but didn't review, increasing the burden on maintainers who are already
stretched thin. And let's be clear, there will be a lot of these.

The broader societal question, do we produce five times as much for the
sake of growth, or do we consider that reasonable use of collective
resources matters? — is not Debian's to answer. But we should be aware
that whatever policy we adopt sends a signal, and that signal matters.

### The political game of stability

In an everchanging GNU/Linux world, Debian has something somewhat
unique. Something that's also unique when we consider the IT world in
general. We are slow.

For some people it's a bad thing. But for many others, it's actually a
good thing. Debian symbolizes stability. We take our time, we release
"when it's ready", we take many months to integrate newcomers. This
carries some risks (eg not getting enough new contributors) but this
gives a lot of reassurance to our end users, they can go to sleep one
day and come back the day after and nothing changed that much. Even
simple things take some time with us.

If anything, the IT tool that represents the opposite
(instability-and-what-the-hell-is-the-go-to-tool-this-week) was cloud
for a long time, and now clearly AI is replacing it by a large margin.

How can we reconcile AI-generated content and Debian? Would this be a
betrayal of what makes Debian Debian? I understand that we regularly
realize that we need to change, too, so this is a real question. I have
no answer to give, but I'm happy to lay down the question because it
needs to be asked.

## AI is here - going forward within Debian

So, AI is here, including in Debian.

I'd have preferred if the question was not asked, but now we can't avoid
asking the question: what do we do? how do we manage it?

I wrote above that I might come with a proposal, but I have none. I have
some preferences I'd like to see in a policy if one were to be drafted.

 - I'd really prefer if those eager to use such tools refrained from
   using these when they don't really benefit from these, *id est*,
   tried to have a reasonable usage of these tools and therefore the
   resources these tools use;
 - I'd really prefer if people were to use these tools only to achieve
   tasks in which they have expertise and could achieve themselves, so
   that they can review the work done;
 - I'd really prefer if any AI-generated content were identified as
   such;
 - I'd really prefer if such content were reappropriated and rewritten
   so that it can be copyrighted;
 - I'd really prefer if we could find a way to have a FOSS model with
   reasonable quality of code used;
 - Last but not least, I'd really prefer if those totally against and
   those with an accelerationist position could stop caricaturing the
   other parties, and accept that nuance is the basis of a sane
   discussion.

Because the day we stop being able to communicate is the day we will
really be dead.

-- 
PEB

signature.asc
Description: PGP signature

[Discussion] AI-generated content within Debian

Reply via email to