Hi folks,

I'd like to start a discussion on whether we should add a page to the
Iceberg documentation describing expectations around AI-generated
contributions.

This topic has recently been discussed on the Arrow dev mailing
list[1]. In addition, the iceberg-cpp project has already taken a step
in this direction by introducing AI-related contribution
guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
topic more broadly within the Iceberg community.

The ASF already provides high-level guidance on the use of generative
AI tools, primarily focused on licensing and IP considerations[3]. As
AI-assisted development and so-called "vibe coding" become more
common, thoughtful use of these tools can be beneficial; however, if
the contributing author appears not to have engaged deeply with the
code and/or cannot respond to review feedback, this can significantly
increase maintainer burden and make the review process less
collaborative.

Having documented guidelines would give maintainers a clear reference
point when evaluating such contributions (including when deciding to
close a PR), and would also make it easier to assess whether a
contributor has made a reasonable effort to meet project expectations.

I've pulled together some guidelines from iceberg-cpp's PR and
discussions on the Arrow dev ML, hoping to kick off a broader
conversation about what should go into Iceberg's AI-generated
contribution guidelines.

-----

We are not opposed to the use of AI tools in generating PRs, but we
recommend that contributors adhere to the following principles:

- The PR author should **understand the core ideas** behind the
implementation **end-to-end**, and be able to justify the design and
code during review.
- **Calls out unknowns and assumptions**. It's okay to not fully
understand some bits of AI generated code. You should comment on these
cases and point them out to reviewers so that they can use their
knowledge of the codebase to clear up any concerns. For example, you
might comment "calling this function here seems to work but I'm not
familiar with how it works internally, I wonder if there's a race
condition if it is called concurrently".
- Only submit a PR if you are able to debug, explain, and take
ownership of the changes.
- Ensure the PR title and description match the style, level of
detail, and tone of other Iceberg PRs.
- Follow coding conventions used in the rest of the codebase.
- Be upfront about AI usage, including a brief summary of which parts
were AI-generated.
- Reference any sources that guided your changes (e.g. "took a similar
approach to #XXXX").

-----

Looking forward to hearing your thoughts.

[1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
[2] https://github.com/apache/iceberg-cpp/pull/531
[3] https://www.apache.org/legal/generative-tooling.html

-- 
Regards
Junwang Zhao

Reply via email to