Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Gang Wu Mon, 26 Jan 2026 04:10:37 -0800

Thanks Junwang for raising this! I strongly agree with this proposal.

This aligns perfectly with some common issues I've recently
encountered in different projects. We have indeed observed a trend
where individuals, who lack a deep understanding of Iceberg, are
starting to use AI to generate PRs. This AI-produced code often looks
correct on the surface but contains numerous hidden issues.


For iceberg-cpp, which has limited reviewer resources, processing
these low-quality PRs consumes a significant amount of valuable time
and effort.

Therefore, a clear guidance document is crucial. It would effectively
communicate the project's expectations regarding PR quality and
ownership to contributors. If a contributor simply dumps a low-effort
PR that lacks the author's deep understanding and debugging
capability, the document would set the expectation that it is unlikely
to be reviewed by maintainers, thus preventing unnecessary maintenance
burden.

Best,
Gang

On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote:
>
> Hi folks,
>
> I'd like to start a discussion on whether we should add a page to the
> Iceberg documentation describing expectations around AI-generated
> contributions.
>
> This topic has recently been discussed on the Arrow dev mailing
> list[1]. In addition, the iceberg-cpp project has already taken a step
> in this direction by introducing AI-related contribution
> guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
> Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
> topic more broadly within the Iceberg community.
>
> The ASF already provides high-level guidance on the use of generative
> AI tools, primarily focused on licensing and IP considerations[3]. As
> AI-assisted development and so-called "vibe coding" become more
> common, thoughtful use of these tools can be beneficial; however, if
> the contributing author appears not to have engaged deeply with the
> code and/or cannot respond to review feedback, this can significantly
> increase maintainer burden and make the review process less
> collaborative.
>
> Having documented guidelines would give maintainers a clear reference
> point when evaluating such contributions (including when deciding to
> close a PR), and would also make it easier to assess whether a
> contributor has made a reasonable effort to meet project expectations.
>
> I've pulled together some guidelines from iceberg-cpp's PR and
> discussions on the Arrow dev ML, hoping to kick off a broader
> conversation about what should go into Iceberg's AI-generated
> contribution guidelines.
>
> -----
>
> We are not opposed to the use of AI tools in generating PRs, but we
> recommend that contributors adhere to the following principles:
>
> - The PR author should **understand the core ideas** behind the
> implementation **end-to-end**, and be able to justify the design and
> code during review.
> - **Calls out unknowns and assumptions**. It's okay to not fully
> understand some bits of AI generated code. You should comment on these
> cases and point them out to reviewers so that they can use their
> knowledge of the codebase to clear up any concerns. For example, you
> might comment "calling this function here seems to work but I'm not
> familiar with how it works internally, I wonder if there's a race
> condition if it is called concurrently".
> - Only submit a PR if you are able to debug, explain, and take
> ownership of the changes.
> - Ensure the PR title and description match the style, level of
> detail, and tone of other Iceberg PRs.
> - Follow coding conventions used in the rest of the codebase.
> - Be upfront about AI usage, including a brief summary of which parts
> were AI-generated.
> - Reference any sources that guided your changes (e.g. "took a similar
> approach to #XXXX").
>
> -----
>
> Looking forward to hearing your thoughts.
>
> [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
> [2] https://github.com/apache/iceberg-cpp/pull/531
> [3] https://www.apache.org/legal/generative-tooling.html
>
> --
> Regards
> Junwang Zhao

Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Reply via email to