Subject: Re: [DISCUSS] Agents opening PRs

Hi Jarek,

Thanks for the thoughtful response — the point about agents instantly
re-requesting assignment is a real concern.

That said, I think the key distinction is who controls the assignee
slot. Rather than contributors (or agents) requesting assignment, what
if maintainers were the ones to grant it — based on their own current
capacity? Each maintainer could self-regulate how many issues they're
actively triaging at a given time. Even if agents flood the queue with
requests, nothing moves forward without a maintainer actively choosing
to open a slot.

This shifts the bottleneck to maintainer bandwidth, which is already
the real constraint anyway. And it naturally filters signal from noise
— maintainers would prioritize issues worth acting on.

Could that be a workable middle ground?

Junyeong Kim

2026년 6월 11일 (목) 오후 9:07, Jarek Potiuk <[email protected]>님이 작성:
>
> Hi everyone,
>
> Just a quick update that’s quite relevant to this discussion and Ash’s
> concerns about AGENTS.md. I had a great call yesterday with Jason and our
> GSoC intern, Roy. We’ve decided to focus his internship on optimizing
> AGENTS.md by extracting key sections and defining evals for them, inspired
> by the mini-eval framework in Magpie. This should help make our agentic
> instructions much more deterministic. Since agents can struggle with very
> long instructions, splitting these into smaller, focused "skills" should
> really help them follow our guidelines more reliably.
>
> We’ll share a formal announcement on the devlist soon. I’d love for us all
> to jump in on the reviews—it’s a great chance for us to learn together
> about agent limitations and how to better manage them.
>
> Junyeong, thanks for the suggestion on reintroducing assignments. While I
> understand the intent, I'm a little worried it might backfire. In the past,
> "assign and disappear" was a challenge, but my bigger concern now is that
> agents can "request assignment" almost instantly after de-assigning and
> practically for free (deterministically). Previously, requesting
> assignments created a lot of noise and required maintainers to act.
> However, even if we automate this - like some other projects—agents could
> effectively block issues indefinitely, making it much harder for real human
> contributors to find an opening.
>
> But - looking forward to hearing more thoughts.
>
> Best regards,
>
> Jarek
>
>
> On Thu, Jun 11, 2026 at 1:39 AM 김준영 <[email protected]> wrote:
>
> > Hi all,
> >
> > Thanks for the discussion — as a contributor, I've found it really
> > helpful to understand how maintainers are thinking about this.
> >
> > One thing I've noticed from the contributor side: without an assignee
> > system, there's no clear signal at the issue level that someone is
> > already working on something. That lower friction might be part of
> > what's making it easier for agent-driven PRs to slip through without
> > prior discussion.
> >
> > I'm not sure of the full history behind removing assignees, but I
> > wonder if the original "assign and abandon" problem could have been
> > addressed with an auto-unassign policy (e.g. 2 weeks of inactivity)
> > rather than removing the system entirely. Reintroducing assignees with
> > that kind of timeout might act as an upstream complement to the
> > PR-level checks being discussed here.
> >
> > Could that be worth revisiting alongside Jarek's proposal?
> >
> > Junyeong Kim
> >
> > 2026년 6월 11일 (목) 오전 8:20, Jarek Potiuk <[email protected]>님이 작성:
> > >
> > > > I was watching the mail train and I think that sounds good. Hope the
> > > > check can be made early e.g. during build info and if possible can we
> > > > (once setting to DRAFT) kill all successor steps to save CI capacity?
> > >
> > > Excellent idea - absolutely, we can build it into "selective-checks" to
> > > "fail" and make a clear statement during failure. I hadn't thought of
> > that.
> > > There were some ideas about "pull_request_target", but yes, you are
> > > completely right - all that checks are deterministic and can be part of
> > the
> > > "buid info" job that we use to determine what to do with the PR. Should
> > be
> > > very simple.
> > >
> > >
> > > On Wed, Jun 10, 2026 at 8:43 PM Jens Scheffler <[email protected]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I was watching the mail train and I think that sounds good. Hope the
> > > > check can be made early e.g. during build info and if possible can we
> > > > (once setting to DRAFT) kill all successor steps to save CI capacity?
> > > >
> > > > Otherwise I hope we can most constructive, not "Fighting fire with
> > fire"
> > > > but rather aim to improve agent descriptions to optimize other's token
> > > > budgets in favor of our requirements. We can not turn back time and
> > need
> > > > to assume the level of agent contributions will stay forever in future.
> > > >
> > > > Jens
> > > >
> > > > On 10.06.26 08:55, Jarek Potiuk wrote:
> > > > > Hi everyone,
> > > > >
> > > > > I’ve spent some time reflecting on all the great points raised here.
> > Our
> > > > > shared goals are to ensure human ownership and review, keep agents as
> > > > > helpful assistants rather than sole authors, and reduce the cognitive
> > > > load
> > > > > from long AI-generated descriptions.
> > > > >
> > > > > I really like Shahar's proposal and would love to build on it with a
> > few
> > > > > suggestions to make our expectations clear and supportive for our
> > human
> > > > > contributors:
> > > > >
> > > > >    - Explicit Instructions: Let’s be very open in our templates and
> > > > > AGENTS.md. We can instruct agents to pause and ask the human to
> > write the
> > > > > description, noting that this personal touch is essential for the PR
> > to
> > > > > stay open.
> > > > >    - Human Review Checkbox: I suggest adding a checkbox: "- [ ] I
> > have
> > > > > reviewed this code myself." We’ll instruct agents to leave this for
> > the
> > > > > human to check, ensuring that vital moment of reflection.
> > > > >    - Instead of copy-pasting (which I find awkward), we can instruct
> > the
> > > > > agents to use `gh --web`, `--template` (to include the template), and
> > > > > `--draft` (following Pierre's idea). This creates natural
> > > > > checkpoints—filling the description, checking the box, clicking
> > submit,
> > > > and
> > > > > undrafting—that encourage human involvement.
> > > > >
> > > > > We should also state the consequences for non-compliance: To keep our
> > > > queue
> > > > > healthy, we should use an automated process to close PRs that miss
> > these
> > > > > steps, with a note explaining how to resubmit them with human input.
> > > > >
> > > > > All those expectations and closing etc. should be equally applied to
> > all
> > > > > PRs, including maintainer PRs. This will also allow those of us who
> > use
> > > > > agents to monitor the process and refine the instructions if we see
> > any
> > > > > loopholes that agents try to bypass or learn how to circumvent. This
> > will
> > > > > allow us to continuously improve the process.
> > > > >
> > > > > I believe this approach balances productivity with the high-quality
> > human
> > > > > collaboration we all value.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Jarek
> > > > >
> > > > >
> > > > > On Tue, Jun 9, 2026 at 5:00 PM Shahar Epstein <[email protected]>
> > wrote:
> > > > >
> > > > >> Here's a more concrete suggestion:
> > > > >>
> > > > >> Updating the PR template in such a way that:
> > > > >> 1. Human summary is now a MUST - at least a oneliner* (or more,
> > > > depending
> > > > >> on the scope - TBD) that describes the suggested changes written by
> > the
> > > > >> PR's author themselves (without AI assistance).
> > > > >> 2. AI summary is optional. However, when included - it MUST be bound
> > > > within
> > > > >> a collapsible box, mainly to save cognitive load for maintainers and
> > > > >> contributors, but also to encourage human interaction like we used
> > to do
> > > > >> before it all started.
> > > > >> 3. PR's author (human) should be the one declaring the AI usage
> > > > checkbox -
> > > > >> added a short statement of ownership.
> > > > >>
> > > > >> Contributors will be instructed to use this template and adhere to
> > the
> > > > >> instructions when creating a PR.
> > > > >> Agents may push branches to forks, but they will be instructed to
> > avoid
> > > > >> creating PRs on their own to the upstream repository, and instead
> > > > provide
> > > > >> the link for creating the PR using this template (they could
> > suggest an
> > > > AI
> > > > >> summary, but the contributor should copy and paste it manually to
> > the
> > > > >> collapsible box). Trying to work around that might result in an M&M
> > test
> > > > >> directly in the PR's description (TBD).
> > > > >>
> > > > >> Example is available here <
> > https://github.com/apache/airflow/pull/68055>
> > > > -
> > > > >> I've made HTML comments visible, they will be hidden in the real
> > thing.
> > > > >>
> > > > >> Took inspiration for this idea from https://tenbluelinks.org/ ,
> > that
> > > > hides
> > > > >> the AI overview on Google if you're not interested
> > (highly-recommended
> > > > >> btw).
> > > > >>
> > > > >> Can we live with that?
> > > > >>
> > > > >>
> > > > >> Shahar
> > > > >>
> > > > >> On Tue, Jun 9, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]>
> > > > wrote:
> > > > >>
> > > > >>> I don’t care one way or another about using AI as a tool in CI,
> > that is
> > > > >>> secondary to my goal which is to try and do something to make it
> > clear
> > > > >> what
> > > > >>> we expect from people wanting to contribute to Airflow, namely:
> > > > >>>
> > > > >>> 1. Human involvement.
> > > > >>>
> > > > >>> By submitting a PR you are saying “yes I want to be a member of the
> > > > >>> community”. Agents submitting without human interaction go against
> > > > this.
> > > > >>>
> > > > >>> 2. Human ownership.
> > > > >>>
> > > > >>> It is _your responsibility_ as the PR author to follow up on it,
> > > > address
> > > > >>> comments, and request reviews.
> > > > >>>
> > > > >>>
> > > > >>> I frankly find the AI generated triage comments verbose,  and a
> > waste
> > > > of
> > > > >>> time and pure noise even without the `@` spam.
> > > > >>>
> > > > >>> If the user doesn’t care enough about their own PR to follow up on
> > it:
> > > > >>> close it after some time. We don’t need to baby sit them. Nor do I
> > need
> > > > >> yet
> > > > >>> more commit email messages to read through.
> > > > >>>
> > > > >>>
> > > > >>> So how does it sound: It sounds like hell to me and an even bigger
> > > > waste
> > > > >>> of electricity in a climate crisis.
> > > > >>>
> > > > >>> I want to be involved in a community of humans working to build
> > > > software.
> > > > >>> I do not want to see LLMs producing so much output that other
> > people
> > > > need
> > > > >>> LLMs to summarise it, with no humans looking at things.
> > > > >>>
> > > > >>> -ash
> > > > >>>
> > > > >>>> On 9 Jun 2026, at 13:18, Jarek Potiuk <[email protected]> wrote:
> > > > >>>>
> > > > >>>>> Why? Because AI “instructions” cannot be trusted. And I am after
> > a
> > > > >>> signal
> > > > >>>> that people are blindly using LLMs without enough human
> > introversion.
> > > > >>>>
> > > > >>>> But is not that what you are doing? This proposal is about adding
> > > > >> another
> > > > >>>> AI instruction (just hidden in HTML) - how is that going to help?
> > > > >>>>
> > > > >>>>> You already updated the instructions to not `@` the reviewer here
> > > > >>>> Indeed, LLMs are not deterministic by nature. But they are
> > improvable.
> > > > >>>> Through iterations of refinement and adding more guardrails we can
> > > > >>> improve
> > > > >>>> it—and this is exactly why I am running it manually to make it
> > better.
> > > > >>> This
> > > > >>>> is the same as in regular breeze development in the past.
> > Initially,
> > > > >>> there
> > > > >>>> were many small issues - and I remember how you complained about
> > them
> > > > >> and
> > > > >>>> how unnecessary they seemed—yet we now perfected it over time.
> > Now, it
> > > > >>>> allows all contributors and maintainers to work much more
> > efficiently
> > > > >> and
> > > > >>>> lose less time. BTW. Thanks for notifying me; I must strengthen
> > this
> > > > >> one
> > > > >>>> and see why, as there might be another improvement to implement.
> > This
> > > > >> is
> > > > >>>> also why we are not "yet" doing CI analysis by AI - because I
> > want to
> > > > >>>> iterate on it and fix it in the way to know which parts are
> > > > >>> deterministic.
> > > > >>>>> I want to do anything and everything to reduce the drive by
> > > > >> contribution
> > > > >>>> with no human activity. I’m happy to spend my time helping
> > humans, but
> > > > >> if
> > > > >>>> they are just going to feed that back to an LLM and burn an
> > egregious
> > > > >>>> amount of carbon: no thank you.
> > > > >>>>
> > > > >>>> And again I am not sure how the proposal to add that instruction
> > would
> > > > >>>> address this particular issue? Are you just proposing to add
> > another
> > > > >>>> instruction for the LLM (or am I wrong?). How does it solve the
> > > > >> problem?
> > > > >>>>  From what I understand we have two basic proposals here - that
> > > > >> contradict
> > > > >>>> each other:
> > > > >>>>
> > > > >>>> * Ash - do not use AI to fight with AI at all
> > > > >>>> * Amoght, Shahar - use AI in CI
> > > > >>>>
> > > > >>>> But I think, the triage I am running now shows a third way:
> > > > >>>>
> > > > >>>> * we use AI to try out and generate triage action and figure out
> > which
> > > > >>>> parts are practically 100% deterministic and can help with triage
> > > > (this
> > > > >>> is
> > > > >>>> the stats I am gathering now)
> > > > >>>> * qe use AI to convert the SKILLS we have into deterministic CI
> > code
> > > > >> that
> > > > >>>> does those triage steps (no AI used at all at runtime)
> > > > >>>> * we continue perfecting the manually-triggered AI SKILLS to get
> > more
> > > > >> AI
> > > > >>>> heuristics that we can turn into deterministic CI code
> > > > >>>>
> > > > >>>> This seems to fulfill seemingly contradictory expectations that
> > > > >> different
> > > > >>>> people have in a nice way. I am about to produce stats from the
> > last
> > > > >> run
> > > > >>>> and was just about to propose this approach.
> > > > >>>>
> > > > >>>> How does it sound Ash, Amogh, Shahar and others ?
> > > > >>>>
> > > > >>>> J.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Tue, Jun 9, 2026 at 12:55 PM Ash Berlin-Taylor <[email protected]
> > >
> > > > >>> wrote:
> > > > >>>>> Why? Because AI “instructions” cannot be trusted. And I am after
> > a
> > > > >>> signal
> > > > >>>>> that people are blindly using LLMs without enough human
> > introversion.
> > > > >>>>>
> > > > >>>>> Want a prime example?
> > > > >>>>>
> > > > >>>>> The pr triage skill.
> > > > >>>>>
> > > > >>>>> You already updated the instructions to not `@` the reviewer here
> > > > >>>>>
> > > > >>
> > > >
> > https://github.com/apache/airflow-steward/blob/76cfa5e1d2e682b88df5205e9cda396df51a66b6/skills/pr-management-triage/comment-templates.md#reviewer-mention-policy
> > > > >>>>>> When a comment's only addressee is the PR author (the
> > > > >>>>> request-author-confirmation, reviewer-ping author-primary, and
> > > > >>> review-nudge
> > > > >>>>> author-primary templates), the body references the reviewer
> > without
> > > > >>>>> @-mentioning them
> > > > >>>>>
> > > > >>>>> And yet the LLM did it again:
> > > > >>>>>
> > https://github.com/apache/airflow/pull/66633#discussion_r3344849352
> > > > >>>>>
> > > > >>>>>> @korex-f — A reviewer (@ashb) has requested changes on this PR,
> > so
> > > > >> I've
> > > > >>>>> removed the ready for maintainer review label — the next step is
> > on
> > > > >> your
> > > > >>>>> side. Could you address the review comments (push a fix, or reply
> > > > >>> in-thread
> > > > >>>>> explaining why the feedback doesn't apply)? Once addressed,
> > > > re-request
> > > > >>>>> review from @ashb or re-mark the PR ready and it returns to the
> > > > >>> maintainer
> > > > >>>>> queue. Thank you.
> > > > >>>>>
> > > > >>>>> And frankly I’m tired of all this shit.
> > > > >>>>>
> > > > >>>>> I want to do anything and everything to reduce the drive by
> > > > >> contribution
> > > > >>>>> with no human activity. I’m happy to spend my time helping
> > humans,
> > > > but
> > > > >>> if
> > > > >>>>> they are just going to feed that back to an LLM and burn an
> > egregious
> > > > >>>>> amount of carbon: no thank you.
> > > > >>>>>
> > > > >>>>> -ash
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> On 9 Jun 2026, at 10:38, Jarek Potiuk <[email protected]> wrote:
> > > > >>>>>>
> > > > >>>>>> Hi Ash, Amogh, and Shahar,
> > > > >>>>>>
> > > > >>>>>> Ash, I'm curious to learn more about how the "brown m&m test"
> > > > differs
> > > > >>>>> from
> > > > >>>>>> our current request for agents to identify themselves. Could you
> > > > help
> > > > >>> me
> > > > >>>>>> understand the flow and the specific benefits you see? It feels
> > > > >> similar
> > > > >>>>> to
> > > > >>>>>> me, but I'd love to hear your perspective in case I'm missing a
> > > > >> nuance.
> > > > >>>>>> Regarding the gh pr create --web approach, we included those
> > > > >>> instructions
> > > > >>>>>> to ensure we meet ASF legal guidelines for Gen-AI headers, and
> > to
> > > > >>> support
> > > > >>>>>> contributors who might not have Copilot. That said, if you have
> > > > ideas
> > > > >>> on
> > > > >>>>>> how to trim the context or improve the templates, we truly
> > > > appreciate
> > > > >>> PRs
> > > > >>>>>> that improve them—and many people already have. AGENTS.md is a
> > team
> > > > >>>>> effort,
> > > > >>>>>> and we’re always looking for ways to make it better. Let's keep
> > our
> > > > >>>>>> collaboration positive as we refine these processes together.
> > > > >>>>>>
> > > > >>>>>> Amogh and Shahar, yep the idea of an validatio step in the CI
> > for
> > > > >>>>>> first-time contributions is something we should implement
> > sooner or
> > > > >>>>> later.
> > > > >>>>>> I have actually been gathering stats on this for the last two
> > weeks.
> > > > >>> I’ve
> > > > >>>>>> been preparing to see how manually triggered triage tasks can
> > turn
> > > > >> into
> > > > >>>>>> automated ones—I'm gathering stats on when human judgment is
> > needed.
> > > > >> I
> > > > >>>>>> shared some stats about this recently and will continue
> > gathering
> > > > >> them.
> > > > >>>>> The
> > > > >>>>>> next step is discussing here what and how we can automate.
> > > > >>>>>>
> > > > >>>>>> Also, the current triage process already uses our Pull Request
> > > > >> criteria
> > > > >>>>> to
> > > > >>>>>> pre-classify the PRs and only marks them with "ready for
> > maintainer
> > > > >>>>> review"
> > > > >>>>>> if those criteria are met. So, if there are any specific
> > criteria
> > > > >> you’d
> > > > >>>>>> like to see added to our "Pull request criteria," PRs are most
> > > > >> welcome
> > > > >>>>>> there as well.
> > > > >>>>>>
> > > > >>>>>> Best regards,
> > > > >>>>>>
> > > > >>>>>> Jarek
> > > > >>>>>
> > > > >>>
> > > > >>>
> > ---------------------------------------------------------------------
> > > > >>> To unsubscribe, e-mail: [email protected]
> > > > >>> For additional commands, e-mail: [email protected]
> > > > >>>
> > > > >>>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to