Hi Mark, On Tue, Dec 16, 2025 at 1:29 PM Mark Wielaard <[email protected]> wrote: > > Hi Aaron, > > On Tue, 2025-12-16 at 01:05 -0500, Aaron Merey wrote: > > On Mon, Dec 15, 2025 at 12:25 PM Mark Wielaard <[email protected]> wrote: > > > On Thu, 2025-12-11 at 23:35 -0500, Aaron Merey wrote: > > > > I'd like to propose an elfutils policy for contributions containing > > > > content generated by LLM or AI tools (AI-assisted contributions). A > > > > written policy will help clarify for contributors whether elfutils > > > > accepts AI-assisted contributions and whether any special procedures > > > > apply. > > > > > > I think it would be good to differentiate between LLM generated > > > contributions and [AI] tool assisted contributions. The first seems > > > easy to define and is about whether or not to accept such generated > > > patches. While the later seems a very broad topic that mostly is what > > > tools a developer might use personally, most of which we don't need a > > > policy for. > > > > That's a fair distinction. The policy can be reworded so that it > > addresses contributions containing LLM-generated content (beyond > > accessibility aids) instead of AI tooling in general. > > I think that would lead to the most concise guidance to contributors. > > > > > There isn't a consensus across major open source projects on whether > > > > AI-assisted contributions should be allowed. For example, Binutils > > > > [1], Gentoo [2], and man-pages [3] have adopted policies rejecting > > > > most or all AI-assisted contributions. > > > > > > There have also been discussions by glibc and gcc to adopt a similar > > > policy as binutils has on LLM Generated Content. > > > > > > > Fedora [4] and the Linux Foundation [5] have policies permitting the > > > > use of AI-assisted contributions. Contributors are expected to > > > > disclose the use of any AI tools and take responsibility for the > > > > contribution's quality and license compatibility. > > > > > > The Fedora one is for a large part not about using AI for code > > > contributions. The Linux Foundation one lets each developer try to > > > figure out if there are (legal) issues or not. Both feel like they are > > > not really giving any real guidance but let every individual try to > > > figure it out themselves. > > > > What stood out to me was that these policies do not unconditionally > > ban contributions containing LLM content. This content may be > > acceptable when there is disclosure, license compatibility, and > > absence of incompatible third party content. > > And having the clear rights to sign off on the legal requirements of > the project. Which I think is why these guidelines are not very > practical. It pushes contributors to find some imaginary line where it > is "still" OK to just copy LLM generated content. > > > > > In my opinion, elfutils should permit AI-assisted contributions. As > > > > for specific policies, I suggest the following. > > > > > > > > (1) AI-assisted contributions should include a disclosure that some or > > > > all of the contribution was generated using an AI tool. The git > > > > commit tag "Assisted-by:" has been adopted for this purpose by Fedora, > > > > for instance. > > > > > > I think this is too weak. The tag or comment should at least explain > > > how to replicate the generated content. Which isn't very practical with > > > the current generation of LLM chatbots. Or probably even impossible. I > > > do think it is appropriate for deterministic tooling though, so as to > > > have a recipe to replicate specific code changes. > > > > Reproduction steps for deterministic tools and prompts or conversation > > summaries for LLMs are fine with me. > > The first are fine with me, the second not really. > > > I want to note that reproducibility isn't always required when we > > accept a patch. Of course not all human-authored changes are based on > > a process that's reproducible in practice and I don't think we need to > > introduce this requirement just for LLM content. > > I like the idea of an Assisted-by tag, but only for tools that users, > maintainers and reviewers can also actually use, and that come with > exact instructions or a script that can be used to replicate the > suggested changes. e.g. Assisted-by: emacs isn't very useful, but if > you have an specific elisp script then please provide it (maybe just > include it in the patch) so others can also use it. > > The issue with LLM generated content is that it is not > practical/impossible to provide enough context for anyone else to > recreate it. > > > > > (2) AI-assisted contributions should otherwise be treated like any > > > > other contribution. The contributor vouches for the quality of their > > > > contribution and verifies license compatibility with their DCO > > > > "Signed-off-by:" tag while reviewers evaluate the technical merits of > > > > the contribution. > > > > > > Yes, but I think this just says no such contributions can have a > > > Signed-off-by tag since, at least for LLM chatbot like generated > > > patches, have unclear copyright status and so a contributor cannot > > > > ChatGPT, for example, includes the following statement in its terms of use > > [1] > > > > "Ownership of content. As between you and OpenAI, and to the extent > > permitted by applicable law, you (a) retain your ownership rights in > > Input and (b) own the Output. We hereby assign to you all our right, > > title, and interest, if any, in and to Output. ... Our assignment > > above does not extend to other users’ output or any Third Party > > Output." > > Right, that tells me they might not actually have any rights to grant > you and they acknowledge there are other right holders who don't give > you any rights. > > > If a contributor uses ChatGPT to help prepare a patch and takes > > reasonable care to avoid including third party content, I think the > > contributor can reasonably sign the DCO in this case. There is valid > > disagreement about this of course. Projects such as QEMU [2] have > > policies rejecting LLM content due to the uncertainty of DCO claims. > > On the other hand, Chris Wright and Richard Fontana [3] argue that the > > DCO can be compatible with LLM content. > > I think the QEMU example is what we should follow. I see the argument > made in that Red Hat blog post, but I am not really convinced by their > arguments, it feels they are handwaving away valid concerns about > attribution and legal (copy)rights. But I do like and agree with: > > None of this is to say that projects must allow AI-assisted > contributions. Each project is entitled to make its own rules and set > its own comfort level, and if a project decides to prohibit AI-assisted > contributions for now, that decision deserves respect. > > I think they do bring up an important point of establishing trust. And > that there are trust issues not just legally or technically, but also > ethically. They claim you shouldn't stigmatize contributors that "try > to use AI responsibly". But there is a genuine question if that is even > possible when there are legal/ethical issues around the processing of > training data and when there are even LLMs that are explicitly trained > to act like white supremacists and attacking marginalized groups. Then > there is the economic, energy costs and climate impact. There is a real > trust issue here imho with the current generation of LLMs. Maybe one > day there will be something like > https://sfconservancy.org/activities/aspirational-statement-on-llm-generative-ai-for-programming.html > Then we can maybe reexamine the trust issue. > > > > I would lean the other way and adopt a simple policy like the rest of > > > the core toolchain projects are adopting to reject LLM generated > > > contributions for which the provenance cannot be determined (because > > > the training corpus and/or algorithm is unknown). > > > > These provenance concerns are fair, but can they be accommodated by > > our existing practices? > > I think it can. We should provide guidance that contributors should not > sign off on any (non-trivial) LLM generated code/docs. Just like you > wouldn't sign off on code/docs you "find" somewhere without clear > attribution, copyright and license terms. We can reuse some of the > guidance given by the binutils and/or qemu project to make that clear.
Since we don't have a consensus to allow LLM-generated content then our policy should be to reject it, at least until there's reason to reevaluate these concerns. I'll write a new draft of the proposal that will include the exceptions mentioned in the binutils policy: (1) LLMs can be used to help write code as long as they do not actually generate code (permits accessibility-related use of LLMs), (2) LLMs can be used to generate trivial, non-legally significant changes. Aaron
