Re: [DISCUSS] Make our "ready for review" expectation more explicit and stricter

Jens Scheffler Wed, 04 Mar 2026 12:42:15 -0800

I like the idea and also assume that we can adjust and improve rules andexpectations over time.

I just fear that (soon) if AI costs are put to realistic price levels weneed to check if contributors still have and get free AI bot access,else the idea is melting fast. (Low risk thoug, let's see if thishappens we need to just change the approach... or look for funding)


On 04.03.26 08:13, Jarek Potiuk wrote:

  Another manual step (and bottleneck) in triaging PRs is that maintainers

will still need to approve CI runs on GitHub.

Great point ... and ... it's already handled :)  - look at my PR.

When - during the triage - the triager will see that workflow approval is
needed, my nice little tool will print the diff of the incoming PR on
terminal and ask the triager to confirm that there is nothing suspicious
and after saying "y" the workflow run will be approved.

J.


On Wed, Mar 4, 2026 at 3:35 AM Zhe-You Liu <[email protected]> wrote:

Hi all,

Thanks Jarek for bringing up the auto-triage idea!
Big +1 from me on the “let’s try” decision.

I really like this feature; it can help avoid copy‑pasting or repeatedly
writing similar instructions for contributors to fix baseline test
failures.

I had the same thoughts as Wei regarding flaky tests. Having deterministic
checks or automated comments should be enough to handle flaky test issues,
and contributors can still reach out on Slack to get their PRs reviewed, so
this should not be a problem.

Another manual step (and bottleneck) in triaging PRs is that maintainers
will still need to approve CI runs on GitHub. It doesn’t seem safe to fully
automate CI approval, as there could still be rare cases where an attacker
creates a vulnerable PR that logs environment variables during tests. Even
though we could use an LLM to check for these kinds of vulnerabilities
before approving a CI run, it is still not as safe as a manual review in
most cases (e.g. prompt injection attack). I’m not sure whether anyone has
a good idea for fully automated PR triaging -- for example, automatically
approving CI, periodically checking test baselines for quality (via the
`breeze pr auto-triage`), re‑approving CI as needed, and continuing this
loop until all CI checks are green.

Best regards,
Jason

On Tue, Mar 3, 2026 at 10:48 PM Vincent Beck <[email protected]> wrote:

I like the overall strategy, for sure the tool will need continuous
iterations to handle all the different scenarios. But this is definitely
needed, the number of open PRs just skyrocketed the last few months, it

is

very hard/impossible to keep track of everything.

On 2026/03/03 14:39:41 Jarek Potiuk wrote:


Thanks for bringing this up! Overall, I like this idea, but it's

worth

testing it for a bit before we enforce it, especially the LLM-verify

part.

Oh absolutely. My plan to introduce it is (after the community

hopefully

makes an overall "let's try" decision):

* The human triager is always in the loop, quickly reviewing comments

just

before they are posted to the user (until we achieve high confidence)
* I plan to run it myself as the sole triager for some time to perfect

it

and to pay much more attention initially. I will start with smaller
groups/areas of code and expand as we go - possibly adding more

maintainers

willing to participate in triaging and testing/improving the tool
* See how quickly we can do it on a regular basis - whether we need

several

triagers or perhaps one rotational triager handling all PRs from all

areas

at a time.
* Possibly further automate it. My assessment is that we will have 90%

of

deterministic "fails"—those we can easily automate without hesitation

once

the process and expectations will be in place. The LLM part is a bit

more

nuanced and we can decide after we try.

* The author ensures the PR passes ALL the checks and tests (i.e.

green).

It might sometimes mean we have to - even more quickly to `main`

breakages,

and probably provide some "status" info and exceptions when we know

main

is

broken.

Probably, we should exempt some checks that might be flaky?

Yeah - this part is a bit problematic - but we can likely add also an

easy

automated, deterministic check if the failure is happening for others.
Sending an automated comment like, "Please rebase now, the issue is

fixed,"

to the authors would be super useful when they see unrelated failures.

This

is something we **should** figure out during testing. There will be

plenty

of opportunities :D

* All PRs that do not meet this requirement will be converted to

Drafts

with automated suggestions (reviewed quickly and efficiently by a
triager) provided to the author on the next steps.

This will be super helpful! I also do it manually from time to time.


Yes. I believe converting to Draft is an extremely strong (but fair)

signal

to the author: "Hey, you have work to do.".

Also when this is accompanied by an actionable comment like, "Here is

what

you should do and here is the link describing it," it immediately

filters

out people who submit PRs without much work.

Surely - they might feed the comment into their agent anyway (or it can
read it automatically and act). But if our tool is faster and cheaper

and

more accurate (because of smart human in the driver's seat) than their
tools, we gain an upper hand.
And it should be faster - because we only check the expectation rather

than

figuring out what to do, which should be much faster.

Then in the worst case we will have continuous ping-pong (Draft ->

Undraft

-> Draft), but we will control how fast this loop runs. Generally, our

goal

should be to slow it down rather than respond immediately; for example,
running it daily or every two days is a good idea.

Effectively, if the PR is in the "ready for maintainer review" state,

the

maintainer should be quite certain, that the code quality, tests, etc.,

are

all good. Only then should they take a look (and they can immediately

say,

"No, this is not what we want")—and this is absolutely fine as well. We
should not optimize for contributors spending time on work we might not
accept. This is deliberately not a goal for me. This will automatically
mean that new contributors who want to contribute significant changes

will

mostly waste a lot of time and their PRs will be rejected.

This is largely what we are already doing, mostly because those PRs do

not

follow our "tribal knowledge," which the agent cannot easily derive.
Naturally new contributors should start with small, easy-to-complete

tasks.

that can be easily discarded if reviewers reject them. This is what we
always asked people to start with. So this approach with the triage

tool,

also largely supports this: someone new rewriting the proverbial

scheduler

will have to spend significant time ensuring "auto-triage" passes, only

to

have the idea completely rejected by the reviewer or be asked for a
complete rewrite. And this is perfectly fine. We always encouraged
newcomers to start with small tasks, learn the basics, and "grow" until
they were ready to propose bigger changes or split it into much smaller
chunks. With "auto-triage" this will be natural and expected, requiring
authors to invest more time and effort to reach the "ready for review"
status.

And I think it's absolutely fair and restores the balance we so much

need

now.


Best,
Wei

On Mar 3, 2026, at 9:34 PM, Jarek Potiuk <[email protected]> wrote:

*TL;DR; I propose a stricter (automation-assisted) approach for the

"ready

for review" state and clearer expectations for contributors

regarding

when

maintainers review PRs of non-collaborators.*

Following the
https://lists.apache.org/thread/8tzwwwd7jmtmfo4j9pzg27704g10vpr4

where I

showcased a tool that I claude-coded, I would like to have a

(possibly

short) discussion on this subject and reach a stage where I can

attempt

to

try the tool out.

*Why? *

Because we maintainers are overwhelmed and burning out, we no

longer

see

how our time invested in Airflow can bring significant returns to

us

(personally) and the community.

While some of us spend a lot of time reviewing, commenting on, and

merging

code, with the current rate of AI-generated PRs and other things we

do,

this is not sustainable. Also there is a mismatch—or lack of
clarity—regarding the quality expectations for the PRs we want to

review.

*Social Contract Issue*

We are a good (I think) open source project with a thriving

community

and a

great group of maintainers who are also friends and like to work

with

each

other but also are very open to bringing new community members in.

As

maintainers, we are willing to help new contributors grow and

generally

willing to spend some of our time doing so. This is the social

contract

we

signed up for as OSS maintainers and as committers for the Apache

Software

Foundation PMC. Community Over Code.

However, this social contract - this community-building aspect is

currently

heavily imbalanced because AI-generated content takes away time,

focus

and

energy from the maintainers. Instead of having meaningful

discussions in

PRs about whether changes are needed and communicating with people,

we

start losing time talking to - effectively - AI agents about

hundreds of

smaller and bigger things that should not be there in a first

place.

Currently - collaboration and community building suffer. Even if

real

people submit code generated by agents (which is becoming really

good,

fast

and cheap to produce), we simply lack the time as maintainers to

have

meaningful conversations with the people behind those agents.

Sometimes we lose time talking to agents. Sometimes we lose time on

talking

to people who have 0 understanding of what they are doing and

submitt

continuous crap, and we should not be having that conversation at
all. Sometimes, we just look at the number of PRs opened in a given

day

in

despair, dreading even trying to bring order to them.

And many of us also have some "work" to do or a "feature" to work

on

top

of

that.

I think we need to reclaim the maintainers' collective time to

focus

on

what matters: delegating more responsibility to authors so they

meet

our

expected quality bar (and efficiently verifying it with tools

without

losing time and focus).

*What do we have now?*

We have already done a lot to help with it - AGENTS.The PR

guidelines,

overhauled by Kaxil and updated by others, will certainly help

clarify

expectations for agents in the future. I know Kaxil is also

exploring a

way

to enable automated copilot code reviews in a manner that will not

be too

"dehumanizing" and will work well. This is all good. The better the

agents

people use and the more closely they follow those instructions, the

higher

the quality of incoming PRs will be. But we also need to help

maintainers

easily identify what to focus on—distinguishing work in progress

and

unfinished PRs that need work from those truly "Ready for (human)

review."

*How?*

My proposal has two parts:

* Define and communicate expectations for PRs that maintainers can

manage.

* Relentlessly automate it to ensure expectations are met and that
maintainers can easily focus on those PRs that "Ready for review."

My tool (needs a bit more fine-tuning and refinement):
https://github.com/apache/airflow/pull/62682 `*breeze pr

auto-triage*`

is

designed to do exactly this: automate those expectations by

auto-triaging

the PRs. It not only converts them to Draft when they are not yet

"Ready

For Review," but also provides actionable, automated

(deterministic +

LLM)

comments to the authors. A concrete maintainer (the current

triager)

is

using the tool very efficiently.

*Proposed expectations (for non-collaborators):*

Those are not "new" expectations. Really, I'm proposing we

completely

delegate the responsibility for fulfilling those expectations to

the

author

(with helpful, automated comments - reviewed and confirmed by a

human

triager for now). And simply be very clear that generally no

maintainer

will look at a PR until:

* The author ensures the PR passes ALL the checks and tests (i.e.

green).

It might sometimes mean we have to - even more quickly to `main`

breakages,

and probably provide some "status" info and exceptions when we know

main

is

broken.

* The author follows all PR guidelines (LLM-verified) regarding
description, content, quality, and presence of tests.

* All PRs that do not meet this requirement will be converted to

Drafts

with automated suggestions (reviewed quickly and efficiently by a
triager) provided to the author on the next steps.

* Drafts with no activity will be more aggressively pruned by our

stalebot.

The triager is there mostly to quickly assess and generate

comments—with

tool/AI assistance. The triager won't be the one who actually

reviews

those

PRs when they are "ready for review."

* Only after that do we mark the PR as "*ready for maintainer

review*"

(label)

* Only such PRs should be reviewed and it is entirely up to the

author to

make them ready.

Note: This approach is only for non-collaborators. For

collaborators: we

might have just one expectation - mark your PR with "ready for

maintainer

review" when you think it's ready.
We accept people as committers and collaborators because we already

know

they generally know and follow the rules; automating this step

isn't

necessary.

This is nothing new; we've already been doing this with humans

handling

all

the heavy lifting without much of strictness or organization, but

this is

no longer sustainable.

I propose we make the expectations explicit, communicate them

clearly,

and

relentlessly automate their execution.

I would love to hear what y'all think.

J.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Make our "ready for review" expectation more explicit and stricter

Reply via email to