Hi Bruno,

Bruno Haible via Gnulib discussion list <[email protected]> writes:

> I would like to propose a policy regarding LLM regenerated code in Gnulib.

+1, this sounds like a good idea. It might also be worth bringing up on
[email protected] at some point.

I think the topic of LLMs has come up a few times recently on
[email protected], but I have not followed too closely.

> The reason is that on one hand
>   - Since the beginning of 2025, there has been a trend for "vibe coding" [1],
>   - Even famous people like Linus Torvalds make use of it.
> and on the other hand, there are issues with it, in particular
>
>   - Copyright and license issues: How to identify regurgitated copyrighted
>     code?

That is my main worry.

The copyrightability of outputs from an LLM is unclear, and I am
skeptical of the main argument in defense of LLMs. I.e., that typing a
prompt and then copying the output requires the same original creativity
as taking a photo with a camera.

> Outside of the scope of this proposal are uses of LLMs that don't generate
> code. For these cases, it is already well-known that you need to fact-check
> the LLM's answers.

Agreed.

> Here's a proposed addition to the HACKING file.
>
> ================================================================================
>
> Acceptable use of LLM generated code
> ====================================
>
> General-purpose LLMs as well as LLMs specialized for software programming
> can produce ready-to-use and, in many cases, actually working code.
>
> We need to avoid two problems with that:
>
>   * Copyright and license issue: An LLM may regurgitate a piece of copyrighted
>     code without the copyright header, thus violating the code's license.
>     (Most code licenses require that the copyright header remains intact when
>     the code is copied or becomes the basis of derivative works.)
>
>   * Maintainability issues: Such generated code has initially not been
>     reviewed by a human programmer. It is often greater in size than what a
>     careful programmer would write. Sometimes it also lacks comments.
>     People who use "vibe coding" often also observe that the code is of
>     lower quality.
>     Where software in general can be qualified as for long-term use vs.
>     short-term use, vibe coding tends to be more suitable for short-term used
>     software.
>
> To this end:
>
>   1) Code included in this package that comes from a single LLM prompt
>      must be limited in size: it must be at most 5 lines long.
>
>   2) As a submitter, you assert that you have reviewed such code that you
>      submit.

Looks good to me.

> Rule 1 guarantees that the LLM generated code size is smaller than the
> "legally significant for copyright purposes" threshold, see
> https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html

Yep, Binutils uses that same reference [1].

Collin

[1] https://sourceware.org/binutils/wiki/LLM_Generated_Content

Reply via email to