Hi,

I would like to propose a policy regarding LLM regenerated code in Gnulib.

The reason is that on one hand
  - Since the beginning of 2025, there has been a trend for "vibe coding" [1],
  - Even famous people like Linus Torvalds make use of it.
and on the other hand, there are issues with it, in particular
  - Copyright and license issues: How to identify regurgitated copyrighted
    code?
  - Maintainability: Code never reviewed by a human programmer, larger
    than and possibly less well commented than what a human programmer
    would produce.
[1] https://en.wikipedia.org/wiki/Vibe_coding

Outside of the scope of this proposal are uses of LLMs that don't generate
code. For these cases, it is already well-known that you need to fact-check
the LLM's answers.

Here's a proposed addition to the HACKING file.

================================================================================

Acceptable use of LLM generated code
====================================

General-purpose LLMs as well as LLMs specialized for software programming
can produce ready-to-use and, in many cases, actually working code.

We need to avoid two problems with that:

  * Copyright and license issue: An LLM may regurgitate a piece of copyrighted
    code without the copyright header, thus violating the code's license.
    (Most code licenses require that the copyright header remains intact when
    the code is copied or becomes the basis of derivative works.)

  * Maintainability issues: Such generated code has initially not been
    reviewed by a human programmer. It is often greater in size than what a
    careful programmer would write. Sometimes it also lacks comments.
    People who use "vibe coding" often also observe that the code is of
    lower quality.
    Where software in general can be qualified as for long-term use vs.
    short-term use, vibe coding tends to be more suitable for short-term used
    software.

To this end:

  1) Code included in this package that comes from a single LLM prompt
     must be limited in size: it must be at most 5 lines long.

  2) As a submitter, you assert that you have reviewed such code that you
     submit.

Rule 1 guarantees that the LLM generated code size is smaller than the
"legally significant for copyright purposes" threshold, see
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html

Rule 2 encourages you to not submit unreviewed garbage.

================================================================================

Related policies:
* Linux
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/generated-content.rst
  https://lwn.net/Articles/1032612/
* Asahi Linux
  https://asahilinux.org/docs/project/policies/slop/
* FreeBSD
  
https://www.heise.de/en/news/FreeBSD-policy-AI-generated-source-code-No-thanks-10634141.html
* LLVM
  https://github.com/llvm/llvm-project/blob/main/llvm/docs/AIToolPolicy.md

Let us know what you think.

Bruno




Reply via email to