Re: [DISCUSS] Make our "ready for review" expectation more explicit and stricter

Jarek Potiuk Tue, 03 Mar 2026 06:40:08 -0800

>
>
> Thanks for bringing this up! Overall, I like this idea, but it's worth
> testing it for a bit before we enforce it, especially the LLM-verify part.
>

Oh absolutely. My plan to introduce it is (after the community hopefully
makes an overall "let's try" decision):

* The human triager is always in the loop, quickly reviewing comments just
before they are posted to the user (until we achieve high confidence)
* I plan to run it myself as the sole triager for some time to perfect it
and to pay much more attention initially. I will start with smaller
groups/areas of code and expand as we go - possibly adding more maintainers
willing to participate in triaging and testing/improving the tool
* See how quickly we can do it on a regular basis - whether we need several
triagers or perhaps one rotational triager handling all PRs from all areas
at a time.
* Possibly further automate it. My assessment is that we will have 90% of
deterministic "fails"—those we can easily automate without hesitation once
the process and expectations will be in place. The LLM part is a bit more
nuanced and we can decide after we try.

> * The author ensures the PR passes ALL the checks and tests (i.e. green).
> > It might sometimes mean we have to - even more quickly to `main`
> breakages,
> > and probably provide some "status" info and exceptions when we know main
> is
> > broken.
>
> Probably, we should exempt some checks that might be flaky?
>

Yeah - this part is a bit problematic - but we can likely add also an easy
automated, deterministic check if the failure is happening for others.
Sending an automated comment like, "Please rebase now, the issue is fixed,"
to the authors would be super useful when they see unrelated failures. This
is something we **should** figure out during testing. There will be plenty
of opportunities :D

>
> > * All PRs that do not meet this requirement will be converted to Drafts
> > with automated suggestions (reviewed quickly and efficiently by a
> > triager) provided to the author on the next steps.
>
> This will be super helpful! I also do it manually from time to time.

Yes. I believe converting to Draft is an extremely strong (but fair) signal
to the author: "Hey, you have work to do.".

Also when this is accompanied by an actionable comment like, "Here is what
you should do and here is the link describing it," it immediately filters
out people who submit PRs without much work.

Surely - they might feed the comment into their agent anyway (or it can
read it automatically and act). But if our tool is faster and cheaper and
more accurate (because of smart human in the driver's seat) than their
tools, we gain an upper hand.
And it should be faster - because we only check the expectation rather than
figuring out what to do, which should be much faster.

Then in the worst case we will have continuous ping-pong (Draft -> Undraft
-> Draft), but we will control how fast this loop runs. Generally, our goal
should be to slow it down rather than respond immediately; for example,
running it daily or every two days is a good idea.

Effectively, if the PR is in the "ready for maintainer review" state, the
maintainer should be quite certain, that the code quality, tests, etc., are
all good. Only then should they take a look (and they can immediately say,
"No, this is not what we want")—and this is absolutely fine as well. We
should not optimize for contributors spending time on work we might not
accept. This is deliberately not a goal for me. This will automatically
mean that new contributors who want to contribute significant changes will
mostly waste a lot of time and their PRs will be rejected.

This is largely what we are already doing, mostly because those PRs do not
follow our "tribal knowledge," which the agent cannot easily derive.
Naturally new contributors should start with small, easy-to-complete tasks.
that can be easily discarded if reviewers reject them. This is what we
always asked people to start with. So this approach with the triage tool,
also largely supports this: someone new rewriting the proverbial scheduler
will have to spend significant time ensuring "auto-triage" passes, only to
have the idea completely rejected by the reviewer or be asked for a
complete rewrite. And this is perfectly fine. We always encouraged
newcomers to start with small tasks, learn the basics, and "grow" until
they were ready to propose bigger changes or split it into much smaller
chunks. With "auto-triage" this will be natural and expected, requiring
authors to invest more time and effort to reach the "ready for review"
status.

And I think it's absolutely fair and restores the balance we so much need
now.

>
>
> Best,
> Wei
>
> > On Mar 3, 2026, at 9:34 PM, Jarek Potiuk <[email protected]> wrote:
> >
> > *TL;DR; I propose a stricter (automation-assisted) approach for the
> "ready
> > for review" state and clearer expectations for contributors regarding
> when
> > maintainers review PRs of non-collaborators.*
> >
> > Following the
> > https://lists.apache.org/thread/8tzwwwd7jmtmfo4j9pzg27704g10vpr4 where I
> > showcased a tool that I claude-coded, I would like to have a (possibly
> > short) discussion on this subject and reach a stage where I can attempt
> to
> > try the tool out.
> >
> > *Why? *
> >
> > Because we maintainers are overwhelmed and burning out, we no longer see
> > how our time invested in Airflow can bring significant returns to us
> > (personally) and the community.
> >
> > While some of us spend a lot of time reviewing, commenting on, and
> merging
> > code, with the current rate of AI-generated PRs and other things we do,
> > this is not sustainable. Also there is a mismatch—or lack of
> > clarity—regarding the quality expectations for the PRs we want to review.
> >
> > *Social Contract Issue*
> >
> > We are a good (I think) open source project with a thriving community
> and a
> > great group of maintainers who are also friends and like to work with
> each
> > other but also are very open to bringing new community members in. As
> > maintainers, we are willing to help new contributors grow and generally
> > willing to spend some of our time doing so. This is the social contract
> we
> > signed up for as OSS maintainers and as committers for the Apache
> Software
> > Foundation PMC. Community Over Code.
> >
> > However, this social contract - this community-building aspect is
> currently
> > heavily imbalanced because AI-generated content takes away time, focus
> and
> > energy from the maintainers. Instead of having meaningful discussions in
> > PRs about whether changes are needed and communicating with people, we
> > start losing time talking to - effectively - AI agents about hundreds of
> > smaller and bigger things that should not be there in a first place.
> >
> > Currently - collaboration and community building suffer. Even if real
> > people submit code generated by agents (which is becoming really good,
> fast
> > and cheap to produce), we simply lack the time as maintainers to have
> > meaningful conversations with the people behind those agents.
> >
> > Sometimes we lose time talking to agents. Sometimes we lose time on
> talking
> > to people who have 0 understanding of what they are doing and submitt
> > continuous crap, and we should not be having that conversation at
> > all. Sometimes, we just look at the number of PRs opened in a given day
> in
> > despair, dreading even trying to bring order to them.
> >
> > And many of us also have some "work" to do or a "feature" to work on top
> of
> > that.
> >
> > I think we need to reclaim the maintainers' collective time to focus on
> > what matters: delegating more responsibility to authors so they meet our
> > expected quality bar (and efficiently verifying it with tools without
> > losing time and focus).
> >
> > *What do we have now?*
> >
> > We have already done a lot to help with it - AGENTS.The PR guidelines,
> > overhauled by Kaxil and updated by others, will certainly help clarify
> > expectations for agents in the future. I know Kaxil is also exploring a
> way
> > to enable automated copilot code reviews in a manner that will not be too
> > "dehumanizing" and will work well. This is all good. The better the
> agents
> > people use and the more closely they follow those instructions, the
> higher
> > the quality of incoming PRs will be. But we also need to help maintainers
> > easily identify what to focus on—distinguishing work in progress and
> > unfinished PRs that need work from those truly "Ready for (human)
> review."
> >
> > *How?*
> >
> > My proposal has two parts:
> >
> > * Define and communicate expectations for PRs that maintainers can
> manage.
> > * Relentlessly automate it to ensure expectations are met and that
> > maintainers can easily focus on those PRs that "Ready for review."
> >
> > My tool (needs a bit more fine-tuning and refinement):
> > https://github.com/apache/airflow/pull/62682 `*breeze pr auto-triage*`
> is
> > designed to do exactly this: automate those expectations by auto-triaging
> > the PRs. It not only converts them to Draft when they are not yet "Ready
> > For Review," but also provides actionable, automated (deterministic +
> LLM)
> > comments to the authors. A concrete maintainer (the current triager) is
> > using the tool very efficiently.
> >
> > *Proposed expectations (for non-collaborators):*
> >
> > Those are not "new" expectations. Really, I'm proposing we completely
> > delegate the responsibility for fulfilling those expectations to the
> author
> > (with helpful, automated comments - reviewed and confirmed by a human
> > triager for now). And simply be very clear that generally no maintainer
> > will look at a PR until:
> >
> > * The author ensures the PR passes ALL the checks and tests (i.e. green).
> > It might sometimes mean we have to - even more quickly to `main`
> breakages,
> > and probably provide some "status" info and exceptions when we know main
> is
> > broken.
> >
> > * The author follows all PR guidelines (LLM-verified) regarding
> > description, content, quality, and presence of tests.
> >
> > * All PRs that do not meet this requirement will be converted to Drafts
> > with automated suggestions (reviewed quickly and efficiently by a
> > triager) provided to the author on the next steps.
> >
> > * Drafts with no activity will be more aggressively pruned by our
> stalebot.
> >
> > The triager is there mostly to quickly assess and generate comments—with
> > tool/AI assistance. The triager won't be the one who actually reviews
> those
> > PRs when they are "ready for review."
> >
> > * Only after that do we mark the PR as "*ready for maintainer review*"
> > (label)
> >
> > * Only such PRs should be reviewed and it is entirely up to the author to
> > make them ready.
> >
> > Note: This approach is only for non-collaborators. For collaborators: we
> > might have just one expectation - mark your PR with "ready for maintainer
> > review" when you think it's ready.
> > We accept people as committers and collaborators because we already know
> > they generally know and follow the rules; automating this step isn't
> > necessary.
> >
> > This is nothing new; we've already been doing this with humans handling
> all
> > the heavy lifting without much of strictness or organization, but this is
> > no longer sustainable.
> >
> > I propose we make the expectations explicit, communicate them clearly,
> and
> > relentlessly automate their execution.
> >
> > I would love to hear what y'all think.
> >
> > J.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Make our "ready for review" expectation more explicit and stricter

Reply via email to