Hi,
I would like to propose a policy regarding LLM regenerated code in Gnulib.
The reason is that on one hand
- Since the beginning of 2025, there has been a trend for "vibe coding" [1],
- Even famous people like Linus Torvalds make use of it.
and on the other hand, there are issues with it, in particular
- Copyright and license issues: How to identify regurgitated copyrighted
code?
- Maintainability: Code never reviewed by a human programmer, larger
than and possibly less well commented than what a human programmer
would produce.
[1] https://en.wikipedia.org/wiki/Vibe_coding
Outside of the scope of this proposal are uses of LLMs that don't generate
code. For these cases, it is already well-known that you need to fact-check
the LLM's answers.
Here's a proposed addition to the HACKING file.
================================================================================
Acceptable use of LLM generated code
====================================
General-purpose LLMs as well as LLMs specialized for software programming
can produce ready-to-use and, in many cases, actually working code.
We need to avoid two problems with that:
* Copyright and license issue: An LLM may regurgitate a piece of copyrighted
code without the copyright header, thus violating the code's license.
(Most code licenses require that the copyright header remains intact when
the code is copied or becomes the basis of derivative works.)
* Maintainability issues: Such generated code has initially not been
reviewed by a human programmer. It is often greater in size than what a
careful programmer would write. Sometimes it also lacks comments.
People who use "vibe coding" often also observe that the code is of
lower quality.
Where software in general can be qualified as for long-term use vs.
short-term use, vibe coding tends to be more suitable for short-term used
software.
To this end:
1) Code included in this package that comes from a single LLM prompt
must be limited in size: it must be at most 5 lines long.
2) As a submitter, you assert that you have reviewed such code that you
submit.
Rule 1 guarantees that the LLM generated code size is smaller than the
"legally significant for copyright purposes" threshold, see
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html
Rule 2 encourages you to not submit unreviewed garbage.
================================================================================
Related policies:
* Linux
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/generated-content.rst
https://lwn.net/Articles/1032612/
* Asahi Linux
https://asahilinux.org/docs/project/policies/slop/
* FreeBSD
https://www.heise.de/en/news/FreeBSD-policy-AI-generated-source-code-No-thanks-10634141.html
* LLVM
https://github.com/llvm/llvm-project/blob/main/llvm/docs/AIToolPolicy.md
Let us know what you think.
Bruno