Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-18 Thread Shane Glass via dev
Hey all,

Just to confirm I understand properly: Is our goal is to reduce the number
of low-quality PRs submitted? If so, our team has been thinking a lot about
how we can help projects reduce the burden of low-quality PRs on
maintainers. I have been testing different structures for various agent
instruction docs (AGENTS.md, llms.txt, a net new file, etc) to see if we
can identify the most effective way to summarize existing AI policies in a
way that agents can best process and understand them. I'm happy to test
some of our ideas here as well. If it works, I'll compile it into a PR and
submit it so we can evaluate how well it works to reduce the burden on
maintainers. If it doesn't work, I'll report back here so we know what
didn't work and can go a different direction.

Is there a canonical set of 3–5 example PRs that represent the type of pull
requests we see often and want to discourage? If not, I can go through the
recently closed PRs and identify what I think are low-quality
contributions for the testing.

Cheers,
Shane


On Mon, Mar 16, 2026 at 6:02 PM huaxin gao  wrote:

> Hi everyone,
>
> Thank you all for the discussion. There is broad agreement that we need
> clearer rules around contribution quality and what happens when those rules
> are not followed.
>
> A few key points that came up:
>
>
>- Focus on contribution quality and readiness, not on trying to detect
>AI usage itself
>- Close PRs that the author clearly did not review before submitting
>- Add a PR template so contributors see the guidelines when they open
>a PR
>- Add an AGENTS.md to set rules for AI tools to follow
>- Add clear consequences to the guidelines for contributors who
>repeatedly ignore them
>- We should find ways to prevent fully automated agent PRs
>
> Kevin has added "Agent automated PRs" to the next Iceberg sync agenda so
> we can continue the discussion there.
>
> Thanks,
>
> Huaxin
>
> On Tue, Mar 10, 2026 at 5:33 AM Steve Loughran 
> wrote:
>
>>
>> First, anyone who is an active committer on a project with >5 k github
>> stars gets 6 month of claude max free
>> https://claude.com/contact-sales/claude-for-oss
>>
>> Which means: many more asf committers will be experiencing what it can
>> and can't do.
>>
>> I'm still learning what it can do, especially on any large body of code,
>> and am happy with the blocking of pure/overly AI generated content as it
>> will only create issues downstream. That's production code, tests, etc.
>> Documentation is an interesting one though, as the tools are good for tasks
>> like "review all links and flag broken ones" as well as "read the docs and
>> highlight inconsistencies".
>>
>> One thing which may be good for any OSS project is to have official
>> CLAUDE.md, GEMINI.md and the copilot equivalents to provide strict
>> instructions to the AI tooling which it doesn't auto infer from the simple
>> /init commands (attached: those two for iceberg)
>>
>> I'm thing of extra style and process, but also instructions to the AI to
>> stop it getting over-enthusiastic
>>
>>1. always use slf4j logging (had a bad experience with gemini
>>replacing every log statement with system.out in my two file project as it
>>couldn't see the output to debug test setup)
>>2. thread safety requirements
>>3. tests to go with the code to explore all branches and failure
>>conditions
>>4. use no content outside this directory tree
>>5. add a  /* begin: AI */ and /* end: AI */ around changes of a given
>>size (ASF policy after all)
>>6. do not touch anything under /format
>>
>> + add the various .gemini/.copilot/.claude dirs with .gitignore set up to
>> ignore customisations.
>>
>>
>>
>>
>>
>> On Tue, 10 Mar 2026 at 03:32, vaquar khan  wrote:
>>
>>> Hi Huaxin, Junwang,
>>>
>>> I’ve been following this thread and I feel the same pain. Reviewing "AI
>>> slop" is the fastest way to burn out a committer, and Junwang is right,
>>> manual closing is just extra work we don't need .
>>>
>>> I've been working on a small utility called AIV (Automated Integrity
>>> Validation) to help with this exact problem at my day job. Instead of
>>> trying to "detect" AI which is a losing battle,it focuses on Logic Density.
>>> Essentially, it checks the ratio of real functional changes to boilerplate.
>>> If someone submits 300 lines of scaffolding but only 2 lines of actual
>>> logic, AIV flags it as "Low Substance." This directly addresses Sung’s
>>> point about "readiness" ,it forces the author to prove there’s actual work
>>> in the PR before a human ever looks at it .
>>>
>>> I’ve already put together a few Iceberg-specific Design Rules for
>>> testing. For example, it can catch when a PR tries to bypass the
>>> ExpireSnapshots API or ignores the new V4 metadata constraints,patterns
>>> that AI agents miss 100% of the time .
>>>
>>> It runs 100% locally or in a CI step with no API keys needed . If the
>>> community is interested, I’m happy to share the 

Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-16 Thread huaxin gao
Hi everyone,

Thank you all for the discussion. There is broad agreement that we need
clearer rules around contribution quality and what happens when those rules
are not followed.

A few key points that came up:


   - Focus on contribution quality and readiness, not on trying to detect
   AI usage itself
   - Close PRs that the author clearly did not review before submitting
   - Add a PR template so contributors see the guidelines when they open a
   PR
   - Add an AGENTS.md to set rules for AI tools to follow
   - Add clear consequences to the guidelines for contributors who
   repeatedly ignore them
   - We should find ways to prevent fully automated agent PRs

Kevin has added "Agent automated PRs" to the next Iceberg sync agenda so we
can continue the discussion there.

Thanks,

Huaxin

On Tue, Mar 10, 2026 at 5:33 AM Steve Loughran  wrote:

>
> First, anyone who is an active committer on a project with >5 k github
> stars gets 6 month of claude max free
> https://claude.com/contact-sales/claude-for-oss
>
> Which means: many more asf committers will be experiencing what it can and
> can't do.
>
> I'm still learning what it can do, especially on any large body of code,
> and am happy with the blocking of pure/overly AI generated content as it
> will only create issues downstream. That's production code, tests, etc.
> Documentation is an interesting one though, as the tools are good for tasks
> like "review all links and flag broken ones" as well as "read the docs and
> highlight inconsistencies".
>
> One thing which may be good for any OSS project is to have official
> CLAUDE.md, GEMINI.md and the copilot equivalents to provide strict
> instructions to the AI tooling which it doesn't auto infer from the simple
> /init commands (attached: those two for iceberg)
>
> I'm thing of extra style and process, but also instructions to the AI to
> stop it getting over-enthusiastic
>
>1. always use slf4j logging (had a bad experience with gemini
>replacing every log statement with system.out in my two file project as it
>couldn't see the output to debug test setup)
>2. thread safety requirements
>3. tests to go with the code to explore all branches and failure
>conditions
>4. use no content outside this directory tree
>5. add a  /* begin: AI */ and /* end: AI */ around changes of a given
>size (ASF policy after all)
>6. do not touch anything under /format
>
> + add the various .gemini/.copilot/.claude dirs with .gitignore set up to
> ignore customisations.
>
>
>
>
>
> On Tue, 10 Mar 2026 at 03:32, vaquar khan  wrote:
>
>> Hi Huaxin, Junwang,
>>
>> I’ve been following this thread and I feel the same pain. Reviewing "AI
>> slop" is the fastest way to burn out a committer, and Junwang is right,
>> manual closing is just extra work we don't need .
>>
>> I've been working on a small utility called AIV (Automated Integrity
>> Validation) to help with this exact problem at my day job. Instead of
>> trying to "detect" AI which is a losing battle,it focuses on Logic Density.
>> Essentially, it checks the ratio of real functional changes to boilerplate.
>> If someone submits 300 lines of scaffolding but only 2 lines of actual
>> logic, AIV flags it as "Low Substance." This directly addresses Sung’s
>> point about "readiness" ,it forces the author to prove there’s actual work
>> in the PR before a human ever looks at it .
>>
>> I’ve already put together a few Iceberg-specific Design Rules for
>> testing. For example, it can catch when a PR tries to bypass the
>> ExpireSnapshots API or ignores the new V4 metadata constraints,patterns
>> that AI agents miss 100% of the time .
>>
>> It runs 100% locally or in a CI step with no API keys needed . If the
>> community is interested, I’m happy to share the code ,it's already an
>> apache licence and we could look at a non-blocking trial to help triage the
>> incoming queue .
>>
>> Regards,
>> Viquar Khan
>>
>> On Mon, 9 Mar 2026 at 22:13, Kevin Liu  wrote:
>>
>>> Thank you for bringing this up. I also feel like I've interacted with a
>>> few of these PRs recently. My suspicion is that these PRs are created by an
>>> "openclaw"-like agent that is automatically finding issues, creating prs,
>>> and responding to reviews. This is slightly different from our previous
>>> conversation, which was centered around AI-generated PRs with
>>> human-in-the-loop. I've just ping the author in one of the suspected PR and
>>> linked to the guidelines.
>>>
>>> I'm in favor of adding some more to the "Guidelines for AI-assisted
>>> Contributions" section [1]. I want to especially call out the burden on the
>>> reviewers and the limited reviewer resources.
>>>
>>> A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent
>>> will respect it?
>>>
>>> Best,
>>> Kevin Liu
>>>
>>>
>>> [1]
>>> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
>>>
>>> On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev <
>>> [email protected]

Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-10 Thread Steve Loughran
First, anyone who is an active committer on a project with >5 k github
stars gets 6 month of claude max free
https://claude.com/contact-sales/claude-for-oss

Which means: many more asf committers will be experiencing what it can and
can't do.

I'm still learning what it can do, especially on any large body of code,
and am happy with the blocking of pure/overly AI generated content as it
will only create issues downstream. That's production code, tests, etc.
Documentation is an interesting one though, as the tools are good for tasks
like "review all links and flag broken ones" as well as "read the docs and
highlight inconsistencies".

One thing which may be good for any OSS project is to have official
CLAUDE.md, GEMINI.md and the copilot equivalents to provide strict
instructions to the AI tooling which it doesn't auto infer from the simple
/init commands (attached: those two for iceberg)

I'm thing of extra style and process, but also instructions to the AI to
stop it getting over-enthusiastic

   1. always use slf4j logging (had a bad experience with gemini replacing
   every log statement with system.out in my two file project as it couldn't
   see the output to debug test setup)
   2. thread safety requirements
   3. tests to go with the code to explore all branches and failure
   conditions
   4. use no content outside this directory tree
   5. add a  /* begin: AI */ and /* end: AI */ around changes of a given
   size (ASF policy after all)
   6. do not touch anything under /format

+ add the various .gemini/.copilot/.claude dirs with .gitignore set up to
ignore customisations.





On Tue, 10 Mar 2026 at 03:32, vaquar khan  wrote:

> Hi Huaxin, Junwang,
>
> I’ve been following this thread and I feel the same pain. Reviewing "AI
> slop" is the fastest way to burn out a committer, and Junwang is right,
> manual closing is just extra work we don't need .
>
> I've been working on a small utility called AIV (Automated Integrity
> Validation) to help with this exact problem at my day job. Instead of
> trying to "detect" AI which is a losing battle,it focuses on Logic Density.
> Essentially, it checks the ratio of real functional changes to boilerplate.
> If someone submits 300 lines of scaffolding but only 2 lines of actual
> logic, AIV flags it as "Low Substance." This directly addresses Sung’s
> point about "readiness" ,it forces the author to prove there’s actual work
> in the PR before a human ever looks at it .
>
> I’ve already put together a few Iceberg-specific Design Rules for testing.
> For example, it can catch when a PR tries to bypass the ExpireSnapshots API
> or ignores the new V4 metadata constraints,patterns that AI agents miss
> 100% of the time .
>
> It runs 100% locally or in a CI step with no API keys needed . If the
> community is interested, I’m happy to share the code ,it's already an
> apache licence and we could look at a non-blocking trial to help triage the
> incoming queue .
>
> Regards,
> Viquar Khan
>
> On Mon, 9 Mar 2026 at 22:13, Kevin Liu  wrote:
>
>> Thank you for bringing this up. I also feel like I've interacted with a
>> few of these PRs recently. My suspicion is that these PRs are created by an
>> "openclaw"-like agent that is automatically finding issues, creating prs,
>> and responding to reviews. This is slightly different from our previous
>> conversation, which was centered around AI-generated PRs with
>> human-in-the-loop. I've just ping the author in one of the suspected PR and
>> linked to the guidelines.
>>
>> I'm in favor of adding some more to the "Guidelines for AI-assisted
>> Contributions" section [1]. I want to especially call out the burden on the
>> reviewers and the limited reviewer resources.
>>
>> A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent
>> will respect it?
>>
>> Best,
>> Kevin Liu
>>
>>
>> [1]
>> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
>>
>> On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev <
>> [email protected]> wrote:
>>
>>> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file.
>>>
>>> If somebody isn’t looking over their PR, they probably aren’t going to
>>> look over the guidelines around contributing. Especially if they’re located
>>> over in a docs page.
>>>
>>> A Pull Request Template forces them to see the community’s guidelines
>>> before they formally make the PR.
>>>
>>> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun  wrote:
>>>
 Thanks for raising this Huaxin. I do think this is very much worth
 discussing.

 I also want to acknowledge that we recently updated the contribution
 guide here [1], so there is already some baseline guidance in place around
 AI-assisted contributions.

 My instinct is that we should be careful not to make this too much
 about AI itself, even though I agree that AI is what has made this issue
 much more pronounced. It is now much easier to generate PRs that look ready
 for review

Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread vaquar khan
Hi Huaxin, Junwang,

I’ve been following this thread and I feel the same pain. Reviewing "AI
slop" is the fastest way to burn out a committer, and Junwang is right,
manual closing is just extra work we don't need .

I've been working on a small utility called AIV (Automated Integrity
Validation) to help with this exact problem at my day job. Instead of
trying to "detect" AI which is a losing battle,it focuses on Logic Density.
Essentially, it checks the ratio of real functional changes to boilerplate.
If someone submits 300 lines of scaffolding but only 2 lines of actual
logic, AIV flags it as "Low Substance." This directly addresses Sung’s
point about "readiness" ,it forces the author to prove there’s actual work
in the PR before a human ever looks at it .

I’ve already put together a few Iceberg-specific Design Rules for testing.
For example, it can catch when a PR tries to bypass the ExpireSnapshots API
or ignores the new V4 metadata constraints,patterns that AI agents miss
100% of the time .

It runs 100% locally or in a CI step with no API keys needed . If the
community is interested, I’m happy to share the code ,it's already an
apache licence and we could look at a non-blocking trial to help triage the
incoming queue .

Regards,
Viquar Khan

On Mon, 9 Mar 2026 at 22:13, Kevin Liu  wrote:

> Thank you for bringing this up. I also feel like I've interacted with a
> few of these PRs recently. My suspicion is that these PRs are created by an
> "openclaw"-like agent that is automatically finding issues, creating prs,
> and responding to reviews. This is slightly different from our previous
> conversation, which was centered around AI-generated PRs with
> human-in-the-loop. I've just ping the author in one of the suspected PR and
> linked to the guidelines.
>
> I'm in favor of adding some more to the "Guidelines for AI-assisted
> Contributions" section [1]. I want to especially call out the burden on the
> reviewers and the limited reviewer resources.
>
> A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent
> will respect it?
>
> Best,
> Kevin Liu
>
>
> [1]
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
>
> On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev <
> [email protected]> wrote:
>
>> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file.
>>
>> If somebody isn’t looking over their PR, they probably aren’t going to
>> look over the guidelines around contributing. Especially if they’re located
>> over in a docs page.
>>
>> A Pull Request Template forces them to see the community’s guidelines
>> before they formally make the PR.
>>
>> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun  wrote:
>>
>>> Thanks for raising this Huaxin. I do think this is very much worth
>>> discussing.
>>>
>>> I also want to acknowledge that we recently updated the contribution
>>> guide here [1], so there is already some baseline guidance in place around
>>> AI-assisted contributions.
>>>
>>> My instinct is that we should be careful not to make this too much about
>>> AI itself, even though I agree that AI is what has made this issue much
>>> more pronounced. It is now much easier to generate PRs that look ready for
>>> review on the surface, even when the author has not really gone through the
>>> content carefully themselves.
>>>
>>> Because of that, I think it may be more useful to frame any additional
>>> guidance around the quality and readiness of the contribution, rather than
>>> around AI use by itself. That feels like a more durable way to set the
>>> standard, since it focuses on things we can actually assess consistently in
>>> review, rather than trying to determine how the content was produced.
>>>
>>> On that note, one practical place to start might be to have a more
>>> formal guideline around when a PR should be marked draft versus ready for
>>> review. I think a positive direction for the community would be to
>>> strengthen contributor judgment around what it means for a PR to actually
>>> be ready for reviewer attention, even if the change looks substantial on
>>> the surface. We already have a fairly simple mention of the draft PR
>>> process [2], and maybe that is a natural place to clarify our standard for
>>> what should be labeled ready for review.
>>>
>>> I also think that kind of guideline would be constructive for someone
>>> who is misreading the readiness of generated code. It gives them a clear
>>> way to adjust their behavior going forward, without making the first
>>> response a punishing one. If we start from an assumption of good intent,
>>> that seems like a better way to help contributors build stronger judgment
>>> over time.
>>>
>>> If the same pattern keeps repeating after that, then I think it makes
>>> sense to handle it as a contribution-process issue, regardless of whether
>>> generative tooling was involved. That may also be worth clarifying, and it
>>> aligns with your question about limiting contributions from p

Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Kevin Liu
Thank you for bringing this up. I also feel like I've interacted with a few
of these PRs recently. My suspicion is that these PRs are created by an
"openclaw"-like agent that is automatically finding issues, creating prs,
and responding to reviews. This is slightly different from our previous
conversation, which was centered around AI-generated PRs with
human-in-the-loop. I've just ping the author in one of the suspected PR and
linked to the guidelines.

I'm in favor of adding some more to the "Guidelines for AI-assisted
Contributions" section [1]. I want to especially call out the burden on the
reviewers and the limited reviewer resources.

A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent
will respect it?

Best,
Kevin Liu


[1]
https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions

On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev 
wrote:

> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file.
>
> If somebody isn’t looking over their PR, they probably aren’t going to
> look over the guidelines around contributing. Especially if they’re located
> over in a docs page.
>
> A Pull Request Template forces them to see the community’s guidelines
> before they formally make the PR.
>
> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun  wrote:
>
>> Thanks for raising this Huaxin. I do think this is very much worth
>> discussing.
>>
>> I also want to acknowledge that we recently updated the contribution
>> guide here [1], so there is already some baseline guidance in place around
>> AI-assisted contributions.
>>
>> My instinct is that we should be careful not to make this too much about
>> AI itself, even though I agree that AI is what has made this issue much
>> more pronounced. It is now much easier to generate PRs that look ready for
>> review on the surface, even when the author has not really gone through the
>> content carefully themselves.
>>
>> Because of that, I think it may be more useful to frame any additional
>> guidance around the quality and readiness of the contribution, rather than
>> around AI use by itself. That feels like a more durable way to set the
>> standard, since it focuses on things we can actually assess consistently in
>> review, rather than trying to determine how the content was produced.
>>
>> On that note, one practical place to start might be to have a more formal
>> guideline around when a PR should be marked draft versus ready for review.
>> I think a positive direction for the community would be to strengthen
>> contributor judgment around what it means for a PR to actually be ready for
>> reviewer attention, even if the change looks substantial on the surface. We
>> already have a fairly simple mention of the draft PR process [2], and maybe
>> that is a natural place to clarify our standard for what should be labeled
>> ready for review.
>>
>> I also think that kind of guideline would be constructive for someone who
>> is misreading the readiness of generated code. It gives them a clear way to
>> adjust their behavior going forward, without making the first response a
>> punishing one. If we start from an assumption of good intent, that seems
>> like a better way to help contributors build stronger judgment over time.
>>
>> If the same pattern keeps repeating after that, then I think it makes
>> sense to handle it as a contribution-process issue, regardless of whether
>> generative tooling was involved. That may also be worth clarifying, and it
>> aligns with your question about limiting contributions from people who
>> repeatedly ignore these guidelines, although I hope clearer standards help
>> avoid getting to that point.
>>
>> Cheers,
>> Sung
>>
>> [1] https://github.com/apache/iceberg/pull/15213
>> [2] https://iceberg.apache.org/contribute/#pull-request-process
>>
>> On 2026/03/10 00:52:43 huaxin gao wrote:
>> > Hi everyone,
>> >
>> > Some recent PRs look like they were made entirely by AI: finding issues,
>> > writing code, opening PRs, and replying to review comments, with no
>> human
>> > review and no disclosure.
>> >
>> > Our guidelines already say contributors are expected to understand their
>> > code, verify AI output before submitting, and disclose AI usage. The
>> > problem is there's nothing about what happens when someone ignores them.
>> >
>> > Should we define consequences? For example:
>> >
>> >
>> >- Closing PRs that were clearly not reviewed by a human before
>> submitting
>> >- Limiting contributions from people who repeatedly ignore these
>> >guidelines
>> >
>> > It's OK to use AI to help write code, but submitting AI output without
>> > looking at it and leaving it to maintainers to catch the problems is not
>> > OK.
>> >
>> > What do you all think?
>> >
>> > Thanks,
>> >
>> > Huaxin
>> >
>>
>


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Alex Stephen via dev
One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file.

If somebody isn’t looking over their PR, they probably aren’t going to look
over the guidelines around contributing. Especially if they’re located over
in a docs page.

A Pull Request Template forces them to see the community’s guidelines
before they formally make the PR.

On Mon, Mar 9, 2026 at 7:55 PM Sung Yun  wrote:

> Thanks for raising this Huaxin. I do think this is very much worth
> discussing.
>
> I also want to acknowledge that we recently updated the contribution guide
> here [1], so there is already some baseline guidance in place around
> AI-assisted contributions.
>
> My instinct is that we should be careful not to make this too much about
> AI itself, even though I agree that AI is what has made this issue much
> more pronounced. It is now much easier to generate PRs that look ready for
> review on the surface, even when the author has not really gone through the
> content carefully themselves.
>
> Because of that, I think it may be more useful to frame any additional
> guidance around the quality and readiness of the contribution, rather than
> around AI use by itself. That feels like a more durable way to set the
> standard, since it focuses on things we can actually assess consistently in
> review, rather than trying to determine how the content was produced.
>
> On that note, one practical place to start might be to have a more formal
> guideline around when a PR should be marked draft versus ready for review.
> I think a positive direction for the community would be to strengthen
> contributor judgment around what it means for a PR to actually be ready for
> reviewer attention, even if the change looks substantial on the surface. We
> already have a fairly simple mention of the draft PR process [2], and maybe
> that is a natural place to clarify our standard for what should be labeled
> ready for review.
>
> I also think that kind of guideline would be constructive for someone who
> is misreading the readiness of generated code. It gives them a clear way to
> adjust their behavior going forward, without making the first response a
> punishing one. If we start from an assumption of good intent, that seems
> like a better way to help contributors build stronger judgment over time.
>
> If the same pattern keeps repeating after that, then I think it makes
> sense to handle it as a contribution-process issue, regardless of whether
> generative tooling was involved. That may also be worth clarifying, and it
> aligns with your question about limiting contributions from people who
> repeatedly ignore these guidelines, although I hope clearer standards help
> avoid getting to that point.
>
> Cheers,
> Sung
>
> [1] https://github.com/apache/iceberg/pull/15213
> [2] https://iceberg.apache.org/contribute/#pull-request-process
>
> On 2026/03/10 00:52:43 huaxin gao wrote:
> > Hi everyone,
> >
> > Some recent PRs look like they were made entirely by AI: finding issues,
> > writing code, opening PRs, and replying to review comments, with no human
> > review and no disclosure.
> >
> > Our guidelines already say contributors are expected to understand their
> > code, verify AI output before submitting, and disclose AI usage. The
> > problem is there's nothing about what happens when someone ignores them.
> >
> > Should we define consequences? For example:
> >
> >
> >- Closing PRs that were clearly not reviewed by a human before
> submitting
> >- Limiting contributions from people who repeatedly ignore these
> >guidelines
> >
> > It's OK to use AI to help write code, but submitting AI output without
> > looking at it and leaving it to maintainers to catch the problems is not
> > OK.
> >
> > What do you all think?
> >
> > Thanks,
> >
> > Huaxin
> >
>


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Sung Yun
Thanks for raising this Huaxin. I do think this is very much worth discussing.

I also want to acknowledge that we recently updated the contribution guide here 
[1], so there is already some baseline guidance in place around AI-assisted 
contributions.

My instinct is that we should be careful not to make this too much about AI 
itself, even though I agree that AI is what has made this issue much more 
pronounced. It is now much easier to generate PRs that look ready for review on 
the surface, even when the author has not really gone through the content 
carefully themselves.

Because of that, I think it may be more useful to frame any additional guidance 
around the quality and readiness of the contribution, rather than around AI use 
by itself. That feels like a more durable way to set the standard, since it 
focuses on things we can actually assess consistently in review, rather than 
trying to determine how the content was produced.

On that note, one practical place to start might be to have a more formal 
guideline around when a PR should be marked draft versus ready for review. I 
think a positive direction for the community would be to strengthen contributor 
judgment around what it means for a PR to actually be ready for reviewer 
attention, even if the change looks substantial on the surface. We already have 
a fairly simple mention of the draft PR process [2], and maybe that is a 
natural place to clarify our standard for what should be labeled ready for 
review.

I also think that kind of guideline would be constructive for someone who is 
misreading the readiness of generated code. It gives them a clear way to adjust 
their behavior going forward, without making the first response a punishing 
one. If we start from an assumption of good intent, that seems like a better 
way to help contributors build stronger judgment over time.

If the same pattern keeps repeating after that, then I think it makes sense to 
handle it as a contribution-process issue, regardless of whether generative 
tooling was involved. That may also be worth clarifying, and it aligns with 
your question about limiting contributions from people who repeatedly ignore 
these guidelines, although I hope clearer standards help avoid getting to that 
point.

Cheers,
Sung

[1] https://github.com/apache/iceberg/pull/15213
[2] https://iceberg.apache.org/contribute/#pull-request-process

On 2026/03/10 00:52:43 huaxin gao wrote:
> Hi everyone,
> 
> Some recent PRs look like they were made entirely by AI: finding issues,
> writing code, opening PRs, and replying to review comments, with no human
> review and no disclosure.
> 
> Our guidelines already say contributors are expected to understand their
> code, verify AI output before submitting, and disclose AI usage. The
> problem is there's nothing about what happens when someone ignores them.
> 
> Should we define consequences? For example:
> 
> 
>- Closing PRs that were clearly not reviewed by a human before submitting
>- Limiting contributions from people who repeatedly ignore these
>guidelines
> 
> It's OK to use AI to help write code, but submitting AI output without
> looking at it and leaving it to maintainers to catch the problems is not
> OK.
> 
> What do you all think?
> 
> Thanks,
> 
> Huaxin
> 


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Neelesh Salian
I'm glad this was raised. Thank you, Huaxin.
I originally suggested adding a comment
 in
the AI Guidelines PR  to
address this issue. We should have a version of this, and/or the
consequences the community decides for contributors who don't align, added
explicitly to the guidelines.

On Mon, Mar 9, 2026 at 5:53 PM huaxin gao  wrote:

> Hi everyone,
>
> Some recent PRs look like they were made entirely by AI: finding issues,
> writing code, opening PRs, and replying to review comments, with no human
> review and no disclosure.
>
> Our guidelines already say contributors are expected to understand their
> code, verify AI output before submitting, and disclose AI usage. The
> problem is there's nothing about what happens when someone ignores them.
>
> Should we define consequences? For example:
>
>
>- Closing PRs that were clearly not reviewed by a human before
>submitting
>- Limiting contributions from people who repeatedly ignore these
>guidelines
>
> It's OK to use AI to help write code, but submitting AI output without
> looking at it and leaving it to maintainers to catch the problems is not
> OK.
>
> What do you all think?
>
> Thanks,
>
> Huaxin
>


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Anurag Mantripragada
Hi Huaxin,

Thank you for bringing up this important topic. Since reviewer time is a
limited resource, it is crucial that we protect it from low-quality AI
submissions.

Committers should have the authority to close PRs if guidelines are not
followed or if it becomes clear that the author does not deeply understand
the changes they are submitting. In such cases, the reason for closing the
PR should be clearly stated.

Regarding limiting contributions from those who repeatedly ignore these
guidelines, this may be difficult to manage at scale. However, we should
definitely find a way to prevent bots from flooding the repository with
automated PRs. I wonder if GitHub or other services have mechanisms to help
with this.

~ Anurag

On Mon, Mar 9, 2026 at 5:53 PM huaxin gao  wrote:

> Hi everyone,
>
> Some recent PRs look like they were made entirely by AI: finding issues,
> writing code, opening PRs, and replying to review comments, with no human
> review and no disclosure.
>
> Our guidelines already say contributors are expected to understand their
> code, verify AI output before submitting, and disclose AI usage. The
> problem is there's nothing about what happens when someone ignores them.
>
> Should we define consequences? For example:
>
>
>- Closing PRs that were clearly not reviewed by a human before
>submitting
>- Limiting contributions from people who repeatedly ignore these
>guidelines
>
> It's OK to use AI to help write code, but submitting AI output without
> looking at it and leaving it to maintainers to catch the problems is not
> OK.
>
> What do you all think?
>
> Thanks,
>
> Huaxin
>


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Holden Karau
+1 closing PRs from bots or humans wrapping bots they don’t understand.

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Mar 9, 2026 at 6:47 PM Junwang Zhao  wrote:

> On Tue, Mar 10, 2026 at 8:53 AM huaxin gao  wrote:
> >
> > Hi everyone,
> >
> > Some recent PRs look like they were made entirely by AI: finding issues,
> writing code, opening PRs, and replying to review comments, with no human
> review and no disclosure.
> >
> > Our guidelines already say contributors are expected to understand their
> code, verify AI output before submitting, and disclose AI usage. The
> problem is there's nothing about what happens when someone ignores them.
> >
> > Should we define consequences? For example:
> >
> > Closing PRs that were clearly not reviewed by a human before submitting
> > Limiting contributions from people who repeatedly ignore these guidelines
> >
> > It's OK to use AI to help write code, but submitting AI output without
> looking at it and leaving it to maintainers to catch the problems is not OK.
> >
> > What do you all think?
>
> Agreed. I'm not sure whether we could use a bot to detect and close such
> PRs automatically. Having maintainers close them manually can be annoying,
> but anyway, we should enforce the AI contribution guidelines.
>
> >
> > Thanks,
> >
> > Huaxin
>
>
>
> --
> Regards
> Junwang Zhao
>


Re: [DISCUSS] Enforcing AI contribution guidelines

2026-03-09 Thread Junwang Zhao
On Tue, Mar 10, 2026 at 8:53 AM huaxin gao  wrote:
>
> Hi everyone,
>
> Some recent PRs look like they were made entirely by AI: finding issues, 
> writing code, opening PRs, and replying to review comments, with no human 
> review and no disclosure.
>
> Our guidelines already say contributors are expected to understand their 
> code, verify AI output before submitting, and disclose AI usage. The problem 
> is there's nothing about what happens when someone ignores them.
>
> Should we define consequences? For example:
>
> Closing PRs that were clearly not reviewed by a human before submitting
> Limiting contributions from people who repeatedly ignore these guidelines
>
> It's OK to use AI to help write code, but submitting AI output without 
> looking at it and leaving it to maintainers to catch the problems is not OK.
>
> What do you all think?

Agreed. I'm not sure whether we could use a bot to detect and close such
PRs automatically. Having maintainers close them manually can be annoying,
but anyway, we should enforce the AI contribution guidelines.

>
> Thanks,
>
> Huaxin



-- 
Regards
Junwang Zhao