Hi Mark,

On Tue, Dec 16, 2025 at 1:29 PM Mark Wielaard <[email protected]> wrote:
>
> Hi Aaron,
>
> On Tue, 2025-12-16 at 01:05 -0500, Aaron Merey wrote:
> > On Mon, Dec 15, 2025 at 12:25 PM Mark Wielaard <[email protected]> wrote:
> > > On Thu, 2025-12-11 at 23:35 -0500, Aaron Merey wrote:
> > > > I'd like to propose an elfutils policy for contributions containing
> > > > content generated by LLM or AI tools (AI-assisted contributions).  A
> > > > written policy will help clarify for contributors whether elfutils
> > > > accepts AI-assisted contributions and whether any special procedures
> > > > apply.
> > >
> > > I think it would be good to differentiate between LLM generated
> > > contributions and [AI] tool assisted contributions. The first seems
> > > easy to define and is about whether or not to accept such generated
> > > patches. While the later seems a very broad topic that mostly is what
> > > tools a developer might use personally, most of which we don't need a
> > > policy for.
> >
> > That's a fair distinction.  The policy can be reworded so that it
> > addresses contributions containing LLM-generated content (beyond
> > accessibility aids) instead of AI tooling in general.
>
> I think that would lead to the most concise guidance to contributors.
>
> > > > There isn't a consensus across major open source projects on whether
> > > > AI-assisted contributions should be allowed.  For example, Binutils
> > > > [1], Gentoo [2], and man-pages [3] have adopted policies rejecting
> > > > most or all AI-assisted contributions.
> > >
> > > There have also been discussions by glibc and gcc to adopt a similar
> > > policy as binutils has on LLM Generated Content.
> > >
> > > > Fedora [4] and the Linux Foundation [5] have policies permitting the
> > > > use of AI-assisted contributions.  Contributors are expected to
> > > > disclose the use of any AI tools and take responsibility for the
> > > > contribution's quality and license compatibility.
> > >
> > > The Fedora one is for a large part not about using AI for code
> > > contributions. The Linux Foundation one lets each developer try to
> > > figure out if there are (legal) issues or not. Both feel like they are
> > > not really giving any real guidance but let every individual try to
> > > figure it out themselves.
> >
> > What stood out to me was that these policies do not unconditionally
> > ban contributions containing LLM content.  This content may be
> > acceptable when there is disclosure, license compatibility, and
> > absence of incompatible third party content.
>
> And having the clear rights to sign off on the legal requirements of
> the project. Which I think is why these guidelines are not very
> practical. It pushes contributors to find some imaginary line where it
> is "still" OK to just copy LLM generated content.
>
> > > > In my opinion, elfutils should permit AI-assisted contributions.  As
> > > > for specific policies, I suggest the following.
> > > >
> > > > (1) AI-assisted contributions should include a disclosure that some or
> > > > all of the contribution was generated using an AI tool.  The git
> > > > commit tag "Assisted-by:" has been adopted for this purpose by Fedora,
> > > > for instance.
> > >
> > > I think this is too weak. The tag or comment should at least explain
> > > how to replicate the generated content. Which isn't very practical with
> > > the current generation of LLM chatbots. Or probably even impossible. I
> > > do think it is appropriate for deterministic tooling though, so as to
> > > have a recipe to replicate specific code changes.
> >
> > Reproduction steps for deterministic tools and prompts or conversation
> > summaries for LLMs are fine with me.
>
> The first are fine with me, the second not really.
>
> > I want to note that reproducibility isn't always required when we
> > accept a patch. Of course not all human-authored changes are based on
> > a process that's reproducible in practice and I don't think we need to
> > introduce this requirement just for LLM content.
>
> I like the idea of an Assisted-by tag, but only for tools that users,
> maintainers and reviewers can also actually use, and that come with
> exact instructions or a script that can be used to replicate the
> suggested changes. e.g. Assisted-by: emacs isn't very useful, but if
> you have an specific elisp script then please provide it (maybe just
> include it in the patch) so others can also use it.
>
> The issue with LLM generated content is that it is not
> practical/impossible to provide enough context for anyone else to
> recreate it.
>
> > > > (2) AI-assisted contributions should otherwise be treated like any
> > > > other contribution.  The contributor vouches for the quality of their
> > > > contribution and verifies license compatibility with their DCO
> > > > "Signed-off-by:" tag while reviewers evaluate the technical merits of
> > > > the contribution.
> > >
> > > Yes, but I think this just says no such contributions can have a
> > > Signed-off-by tag since, at least for LLM chatbot like generated
> > > patches, have unclear copyright status and so a contributor cannot
> >
> > ChatGPT, for example, includes the following statement in its terms of use 
> > [1]
> >
> > "Ownership of content. As between you and OpenAI, and to the extent
> > permitted by applicable law, you (a) retain your ownership rights in
> > Input and (b) own the Output. We hereby assign to you all our right,
> > title, and interest, if any, in and to Output. ... Our assignment
> > above does not extend to other users’ output or any Third Party
> > Output."
>
> Right, that tells me they might not actually have any rights to grant
> you and they acknowledge there are other right holders who don't give
> you any rights.
>
> > If a contributor uses ChatGPT to help prepare a patch and takes
> > reasonable care to avoid including third party content, I think the
> > contributor can reasonably sign the DCO in this case.  There is valid
> > disagreement about this of course.  Projects such as QEMU [2] have
> > policies rejecting LLM content due to the uncertainty of DCO claims.
> > On the other hand, Chris Wright and Richard Fontana [3] argue that the
> > DCO can be compatible with LLM content.
>
> I think the QEMU example is what we should follow. I see the argument
> made in that Red Hat blog post, but I am not really convinced by their
> arguments, it feels they are handwaving away valid concerns about
> attribution and legal (copy)rights. But I do like and agree with:
>
>    None of this is to say that projects must allow AI-assisted
>    contributions. Each project is entitled to make its own rules and set
>    its own comfort level, and if a project decides to prohibit AI-assisted
>    contributions for now, that decision deserves respect.
>
> I think they do bring up an important point of establishing trust. And
> that there are trust issues not just legally or technically, but also
> ethically. They claim you shouldn't stigmatize contributors that "try
> to use AI responsibly". But there is a genuine question if that is even
> possible when there are legal/ethical issues around the processing of
> training data and when there are even LLMs that are explicitly trained
> to act like white supremacists and attacking marginalized groups. Then
> there is the economic, energy costs and climate impact. There is a real
> trust issue here imho with the current generation of LLMs. Maybe one
> day there will be something like
> https://sfconservancy.org/activities/aspirational-statement-on-llm-generative-ai-for-programming.html
> Then we can maybe reexamine the trust issue.
>
> > > I would lean the other way and adopt a simple policy like the rest of
> > > the core toolchain projects are adopting to reject LLM generated
> > > contributions for which the provenance cannot be determined (because
> > > the training corpus and/or algorithm is unknown).
> >
> > These provenance concerns are fair, but can they be accommodated by
> > our existing practices?
>
> I think it can. We should provide guidance that contributors should not
> sign off on any (non-trivial) LLM generated code/docs. Just like you
> wouldn't sign off on code/docs you "find" somewhere without clear
> attribution, copyright and license terms. We can reuse some of the
> guidance given by the binutils and/or qemu project to make that clear.

Since we don't have a consensus to allow LLM-generated content then
our policy should be to reject it, at least until there's reason to
reevaluate these concerns. I'll write a new draft of the proposal that
will include the exceptions mentioned in the binutils policy: (1) LLMs
can be used to help write code as long as they do not actually
generate code (permits accessibility-related use of LLMs), (2) LLMs
can be used to generate trivial, non-legally significant changes.

Aaron

Reply via email to