Re: [DISCUSS] AI-generated contributions

Adam Lippai Sun, 18 Jan 2026 20:11:51 -0800

Hi,

During the holidays I also experimented with adding a feature (which
everyone thought should be simpler, I was surprised too):
https://github.com/apache/arrow/pull/48391
I find the AI experience in Arrow a bit clunky, but on the other hand the
workflow itself is pretty straightforward if the contributor is willing to
go through the PR process.


One aspect is that the AI standards are not mature yet.
Still, we could use AGENTS.md and skills.md files to steer the AI robots to
reduce the time wasted by the reviewers.

Best regards,
Adam Lippai


On Sun, Jan 18, 2026 at 10:54 PM Vignesh Siva <[email protected]>
wrote:

> Hi Nic, Gang, and all,
>
> Thanks for raising this — I agree this guidance would be very valuable. AI
> tools can be helpful, but only when contributors fully understand, own, and
> are willing to iterate on the generated changes.
>
> I strongly support the emphasis on transparency, code ownership, and active
> engagement during review. Gang’s additions around reviewing every generated
> line, avoiding unnecessary verbosity, and keeping PRs small also address
> common reviewer pain points.
> Having this documented (e.g., in the contributor guide) would set clear
> expectations and give maintainers a consistent reference when handling
> low-engagement or undisclosed AI-generated PRs.
> Happy to support moving this forward.
>
> Best regards,
> Vignesh
>
> On Mon, 19 Jan 2026, 9:11 am Gang Wu, <[email protected]> wrote:
>
> > Thanks Nic for raising this!
> >
> > I totally agree with your suggestions and would like to add additional
> ones
> > based on my review experience:
> >
> > - Summitters should review all lines of generated code before creating
> the
> > PR to
> >   understand every piece of detail just like they are written by the
> > submitters
> >   themselves.
> > - AI tools are notorious for generating overly verbose comments,
> > unnecessary
> >   test cases, fixing test failures using wrong approaches, etc. Make sure
> > these
> >   are checked and fixed.
> > - Reviewers are humans, so please try to break down large PRs into
> smaller
> >   ones to make reviewers' life easier to get PRs promptly reviewed.
> >
> > Best,
> > Gang
> >
> > On Mon, Jan 19, 2026 at 3:14 AM Nic Crane <[email protected]> wrote:
> >
> > > Hi folks,
> > >
> > > I'm just emailing to solicit opinions on adding a page about
> AI-generated
> > > contributions to the docs. The ASF has its own guidance[1] which is
> > fairly
> > > high-level and is mainly concerned with licensing. However, we are
> seeing
> > > more AI generated contributions in which the author doesn't seem to
> have
> > > engaged with the code at all and appears to have no intention of
> engaging
> > > with review comments, and I feel like it would be beneficial to have
> > > somewhere in the docs to point to if we close the pull request.
> > >
> > > Having guidelines also makes it easier to tell whether a contributor
> has
> > > made any effort to follow them.
> > >
> > > I experimented with approaches to being transparent about AI use in my
> > own
> > > PRs and have an example here, where the changes were needed but the
> > subject
> > > matter was a little out of my comfort zone[2] - see resolved comments.
> > >
> > > I've made a rough draft[3] of what I think could constitute some
> > > guidelines, but keen to hear what folks think. Happy to hear thoughts
> on
> > > the wording, whether this belongs in the contributor guide, or if there
> > are
> > > concerns I haven't considered.
> > >
> > > Nic
> > >
> > >
> > > [1] https://www.apache.org/legal/generative-tooling.html
> > >
> > > [2] https://github.com/apache/arrow/pull/48634
> > >
> > > [3]
> > > We recognise that AI coding assistants are now a regular part of many
> > > developers' workflows and can improve productivity. Thoughtful use of
> > these
> > > tools can be beneficial, but AI-generated PRs can sometimes lead to
> > > undesirable additional maintainer burden.  Human-generated mistakes
> tend
> > to
> > > be easier to spot and reason about, and code review often feels like a
> > > collaborative learning experience that benefits both submitter and
> > > reviewer. When a PR appears to have been generated without much
> > engagement
> > > from the submitter, it can feel like work that the maintainer might as
> > well
> > > have done themselves.
> > >
> > > We are not opposed to the use of AI tools in generating PRs, but
> > recommend
> > > the following:
> > > - Only take on a PR if you are able to debug and own the changes
> yourself
> > > - Make sure that the PR title and body match the style and length of
> > others
> > > in this repo
> > > - Follow coding conventions used in the rest of the codebase
> > > - Be upfront about AI usage and summarise what was AI-generated
> > > - If there are parts you don't fully understand, add inline comments,
> > > explaining what steps you took to verify correctness
> > >   - Reference any sources that guided your changes (e.g. "took a
> similar
> > > approach to #123456")
> > >
> > > PR authors are also responsible for disclosing any copyrighted
> materials
> > in
> > > submitted contributions, as discussed in the ASF generative tooling
> > > guidance: https://www.apache.org/legal/generative-tooling.html
> > >
> > > If a PR appears to be AI-generated, and the submitter hasn't engaged
> with
> > > the output, doesn't respond to review feedback, or hasn't  disclosed AI
> > > usage, we may close it without further review.
> > >
> >
>

Re: [DISCUSS] AI-generated contributions

Reply via email to