On Wed, Jul 01, 2026 at 06:10:48PM +0200, David Hildenbrand (Arm) wrote:
> On 7/1/26 17:54, Christian Brauner wrote:
> > I remain very confused by our coding assistant contribution guidelines.
> > I'm going to be a bit polemic now but this seriously in good faith.
> >
> > Why precisely do we require all this detailed information about what
> > specific coding assistant was used?
> >
> > I find it very irritating that our git history has effectively started
> > to function a bit like a free advertising platform for a bunch of AI
> > companies and their proprietary agents and models.
> >
> > And it reamins unclear to me what exactly we do get out of this detailed
> > information: Do we want to run statistical analysis on what agent and
> > model is used the most and publish that on LWN at some point?
> >
> > I acknowledge that my stance is even more radical: imho we would just
> > stop it with any disclosure requirements completely. It's useless imho.
> > We already see that other than core contributors most people don't care
> > and will just not disclose their usage of AI. I think this is entirely
> > pointless and worse it brings in undefined legal status as well. It's
> > not like recent events of pulling certain models from the face of the
> > earth have made this any less concerning.
> >
> > But fine, if we want to do this can we please just dumb it down to
> >
> > Assisted-by: LLM
> >
> > or
> >
> > Assisted-by: Coding Assistant
>
> I'd prefer this.
Yeah I don't see any reason why we need to know precisely which model or version
of said model we need.
>
> The doc states "proper attribution helps track the evolving role of AI in the
> development process". If there is another reason why we need the free
> advertisement, we should document it.
Yup.
Honestly I find the phrasing here quite vague.
While it is interesting to track the degree of AI involvement (where that's
disclosed) a really important part of this is how maintainers deal with AI
submissions.
Also we have a schism in the documentation anyway, there's [0] which is
literally indexed as 'AI Coding Assistants', which says NOTHING about how
people are supposed to use them etc. and there's [1] Which DOES say
something about that, but which isn't linked to by [0], nor links to it.
Before I happened across this thread, I was thinking of sending a patch to
at least link one to the other. Now I think I definitely will.
>
> Side note: if someone instructs an LLM exactly what to do, and would have
> achieved the same thing just typing it in, the use of the tag is not any
> helpful
> to me. (similar to "Assisted-by: vim" would not be helpful).
>
> What would be much more relevant to know is to which degree LLMs were used.
As I mentioned off-list I do agree that this is key.
Having this information helps with the most important issue we face when it
comes to AI - an EXISTENTIAL issue actually IMO - the asymmetry between how
much code can be generated, and available maintainer/reviewer resource.
Being able to, at a glance, see that a series was both wholly generated
seems substandard means we can quickly ask for more human attention.
And I know what the argument's going to be - 'bad faith people will lie
about it' - and sure, yes they will.
But now that there's been a huge surge of AI generated code in mm I can
speak from experience - many DO attribute, and for those that don't it's
very useful to have guidelines to point to.
Both aid in dealing with this asymmetry.
(as an example, I've had to push back quite strongly on an _attributed_
series ([2] and [3]) that appeared to be wholly generated. Having this
information would have helped there).
>
> Assisted-by: LLM # translate commit message
> Assisted-by: LLM # generate some test cases
> Assisted-by: LLM # cleanup logic
> Assisted-by: LLM # everything and I have no clue what any in here does
Yeah this format works I think!
>
> I thought we ask for that in some document, but couldn't immediately find it
> (and nobody does that).
Well you're probably thinking of [1], e.g.:
Second, when making a contribution, be transparent about the origin
of content in cover letters and changelogs. You can be more
transparent by adding information like this:
...
- Which portions of the content were affected by that tool?
...
And also from the same document:
If tools permit you to generate a contribution automatically,
expect additional scrutiny in proportion to how much of it was
generated.
As with the output of any tooling, the result may be incorrect or
inappropriate. You are expected to understand and to be able to
defend everything you submit. If you are unable to do so, then do
not submit the resulting changes.
If you do so anyway, maintainers are entitled to reject your series
without detailed review.
This only speaks more to the need to link the two documents together. I'll
send a patch.
>
> --
> Cheers,
>
> David
Thanks, Lorenzo
[0]:https://docs.kernel.org/process/coding-assistants.html
[1]:https://docs.kernel.org/process/generated-content.html
[2]:https://lore.kernel.org/linux-mm/aj9yrlB0TrlYCLlf@lucifer/
[3]:https://lore.kernel.org/linux-mm/akIjA_dqh4OHAYo4@lucifer/