Hi Bruno, Bruno Haible via Gnulib discussion list <[email protected]> writes:
> I would like to propose a policy regarding LLM regenerated code in Gnulib. +1, this sounds like a good idea. It might also be worth bringing up on [email protected] at some point. I think the topic of LLMs has come up a few times recently on [email protected], but I have not followed too closely. > The reason is that on one hand > - Since the beginning of 2025, there has been a trend for "vibe coding" [1], > - Even famous people like Linus Torvalds make use of it. > and on the other hand, there are issues with it, in particular > > - Copyright and license issues: How to identify regurgitated copyrighted > code? That is my main worry. The copyrightability of outputs from an LLM is unclear, and I am skeptical of the main argument in defense of LLMs. I.e., that typing a prompt and then copying the output requires the same original creativity as taking a photo with a camera. > Outside of the scope of this proposal are uses of LLMs that don't generate > code. For these cases, it is already well-known that you need to fact-check > the LLM's answers. Agreed. > Here's a proposed addition to the HACKING file. > > ================================================================================ > > Acceptable use of LLM generated code > ==================================== > > General-purpose LLMs as well as LLMs specialized for software programming > can produce ready-to-use and, in many cases, actually working code. > > We need to avoid two problems with that: > > * Copyright and license issue: An LLM may regurgitate a piece of copyrighted > code without the copyright header, thus violating the code's license. > (Most code licenses require that the copyright header remains intact when > the code is copied or becomes the basis of derivative works.) > > * Maintainability issues: Such generated code has initially not been > reviewed by a human programmer. It is often greater in size than what a > careful programmer would write. Sometimes it also lacks comments. > People who use "vibe coding" often also observe that the code is of > lower quality. > Where software in general can be qualified as for long-term use vs. > short-term use, vibe coding tends to be more suitable for short-term used > software. > > To this end: > > 1) Code included in this package that comes from a single LLM prompt > must be limited in size: it must be at most 5 lines long. > > 2) As a submitter, you assert that you have reviewed such code that you > submit. Looks good to me. > Rule 1 guarantees that the LLM generated code size is smaller than the > "legally significant for copyright purposes" threshold, see > https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html Yep, Binutils uses that same reference [1]. Collin [1] https://sourceware.org/binutils/wiki/LLM_Generated_Content
