That sounds great to me, and indeed will stop someone from
inadvertently clobbering the header. It fits naturally with how
licensing is noted in source files in any case- just piecemeal rather
than at the granularity of files.

On Thu, Dec 18, 2025 at 11:33 AM Peeyush Gupta
<[email protected]> wrote:
>
> The clarification makes sense. I would like to propose another option for 
> discussion, which is to add Generated-By as a comment in the code itself.
> This will be more fine-grained and will allow committers to know which piece 
> of code is LLM generated while updating/reviewing the code. This will also 
> avoid missing generated-by field in case of cherry picks etc.
>
> From: Ian Maxon <[email protected]>
> Date: Thursday, December 18, 2025 at 11:18 AM
> To: [email protected] <[email protected]>
> Subject: Re: [DISCUSS] Adding information about AI usage into commit messages
>
> Good points Peeyush. I can't really think of a perfect answer. The
> best one I can think of is simply to leave it at the committer's
> discretion based on some general guidelines.
>
> For example, determining if the output is simply a regurgitation of
> someone else's code is not easy. However, certainly sometimes it is
> obvious. Things like very specific and unrelated comments, or
> unrelated code that has method or variable names that give it away.
> For those cases, we can clearly say the code is not a synthesis of the
> prompt and training data as a whole. Basically, you know it when you
> see it, but if you don't, then it's fine.
>
> I think tab completions are also another case where judgement can
> apply. Certainly tab completion is an old tool, and technically
> completions done by non-LLM methods could fall into the same traps by
> using snippets and macros. However an obvious next line of code is not
> something most people would call an original or creative work. So in
> those cases, I would judge it's not worth mentioning. If it's
> generating entire methods and classes, then it's probably worth
> mentioning, because that is more substantive output from the tool, not
> just an obvious addition.
>
> Committers are already trusted to give proper attribution to code they
> commit, so overall I think this is just a corollary to that.
>
> On Thu, Dec 18, 2025 at 10:08 AM Peeyush Gupta
> <[email protected]> wrote:
> >
> > Sounds like I good idea to me but needs more clarification.
> >
> >
> >   1.
> > How to find out if the output of the LLM tool is part of its training data.
> >   2.
> > I use, LLM based tools for almost all patches for code completion. 
> > Sometimes the code completion could be just a few words.
> > Do we need to include “Generated-by” in such cases as well? If yes, won’t 
> > it make almost all commits to have this field set.
> >
> > From: [email protected] <[email protected]>
> > Date: Thursday, December 18, 2025 at 10:01 AM
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] Adding information about AI usage into commit 
> > messages
> >
> > I agree. Good idea.
> >
> > On Thu, Dec 18, 2025 at 8:20 AM Mike Carey <[email protected]> wrote:
> >
> > > Sounds like something we kinda need to do - brave new world...
> > >
> > > On 12/17/25 11:10 AM, Ian Maxon wrote:
> > > > Hey folks,
> > > >
> > > > I wanted to propose an addition to the usual commit message header
> > > > that we use today, which looks like this:
> > > >
> > > >> [ASTERIXDB-$ISSUE][$AREA] $COMMIT_SUMMARY
> > > >>
> > > >> - user model changes: yes/no
> > > >> - storage format changes: yes/no
> > > >> - interface changes: yes/no
> > > >>
> > > >> Details:
> > > > I think that we should add a field called "generatively assisted" and
> > > > if it is yes, there should be a footer in the commit message called
> > > > "Generated-by :" that lists the tool(s) used. We should also check
> > > > that this tool's output isn't restricted in some way that would be
> > > > incompatible with the guidance in
> > > > https://www.apache.org/legal/generative-tooling.html. I think
> > > > generally there aren't many tools out there that would run against
> > > > this. The main thing to be aware of is if it's regurgitating code
> > > > that's clearly part of the training data (and that code doesn't have a
> > > > clear and compatible license) or if the tool itself somehow says the
> > > > code it outputs is not yours and can't be licensed as you wish. The
> > > > idea about the footer itself isn't mine, it's from that document. It
> > > > seems like a fine one to me.
> > > >
> > > > Thoughts?
> > > >
> > > > -Ian

Reply via email to