Subject: Re: [DISCUSS] Agents opening PRs Hi Jarek,
Thanks for the thoughtful response — the point about agents instantly re-requesting assignment is a real concern. That said, I think the key distinction is who controls the assignee slot. Rather than contributors (or agents) requesting assignment, what if maintainers were the ones to grant it — based on their own current capacity? Each maintainer could self-regulate how many issues they're actively triaging at a given time. Even if agents flood the queue with requests, nothing moves forward without a maintainer actively choosing to open a slot. This shifts the bottleneck to maintainer bandwidth, which is already the real constraint anyway. And it naturally filters signal from noise — maintainers would prioritize issues worth acting on. Could that be a workable middle ground? Junyeong Kim 2026년 6월 11일 (목) 오후 9:07, Jarek Potiuk <[email protected]>님이 작성: > > Hi everyone, > > Just a quick update that’s quite relevant to this discussion and Ash’s > concerns about AGENTS.md. I had a great call yesterday with Jason and our > GSoC intern, Roy. We’ve decided to focus his internship on optimizing > AGENTS.md by extracting key sections and defining evals for them, inspired > by the mini-eval framework in Magpie. This should help make our agentic > instructions much more deterministic. Since agents can struggle with very > long instructions, splitting these into smaller, focused "skills" should > really help them follow our guidelines more reliably. > > We’ll share a formal announcement on the devlist soon. I’d love for us all > to jump in on the reviews—it’s a great chance for us to learn together > about agent limitations and how to better manage them. > > Junyeong, thanks for the suggestion on reintroducing assignments. While I > understand the intent, I'm a little worried it might backfire. In the past, > "assign and disappear" was a challenge, but my bigger concern now is that > agents can "request assignment" almost instantly after de-assigning and > practically for free (deterministically). Previously, requesting > assignments created a lot of noise and required maintainers to act. > However, even if we automate this - like some other projects—agents could > effectively block issues indefinitely, making it much harder for real human > contributors to find an opening. > > But - looking forward to hearing more thoughts. > > Best regards, > > Jarek > > > On Thu, Jun 11, 2026 at 1:39 AM 김준영 <[email protected]> wrote: > > > Hi all, > > > > Thanks for the discussion — as a contributor, I've found it really > > helpful to understand how maintainers are thinking about this. > > > > One thing I've noticed from the contributor side: without an assignee > > system, there's no clear signal at the issue level that someone is > > already working on something. That lower friction might be part of > > what's making it easier for agent-driven PRs to slip through without > > prior discussion. > > > > I'm not sure of the full history behind removing assignees, but I > > wonder if the original "assign and abandon" problem could have been > > addressed with an auto-unassign policy (e.g. 2 weeks of inactivity) > > rather than removing the system entirely. Reintroducing assignees with > > that kind of timeout might act as an upstream complement to the > > PR-level checks being discussed here. > > > > Could that be worth revisiting alongside Jarek's proposal? > > > > Junyeong Kim > > > > 2026년 6월 11일 (목) 오전 8:20, Jarek Potiuk <[email protected]>님이 작성: > > > > > > > I was watching the mail train and I think that sounds good. Hope the > > > > check can be made early e.g. during build info and if possible can we > > > > (once setting to DRAFT) kill all successor steps to save CI capacity? > > > > > > Excellent idea - absolutely, we can build it into "selective-checks" to > > > "fail" and make a clear statement during failure. I hadn't thought of > > that. > > > There were some ideas about "pull_request_target", but yes, you are > > > completely right - all that checks are deterministic and can be part of > > the > > > "buid info" job that we use to determine what to do with the PR. Should > > be > > > very simple. > > > > > > > > > On Wed, Jun 10, 2026 at 8:43 PM Jens Scheffler <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > I was watching the mail train and I think that sounds good. Hope the > > > > check can be made early e.g. during build info and if possible can we > > > > (once setting to DRAFT) kill all successor steps to save CI capacity? > > > > > > > > Otherwise I hope we can most constructive, not "Fighting fire with > > fire" > > > > but rather aim to improve agent descriptions to optimize other's token > > > > budgets in favor of our requirements. We can not turn back time and > > need > > > > to assume the level of agent contributions will stay forever in future. > > > > > > > > Jens > > > > > > > > On 10.06.26 08:55, Jarek Potiuk wrote: > > > > > Hi everyone, > > > > > > > > > > I’ve spent some time reflecting on all the great points raised here. > > Our > > > > > shared goals are to ensure human ownership and review, keep agents as > > > > > helpful assistants rather than sole authors, and reduce the cognitive > > > > load > > > > > from long AI-generated descriptions. > > > > > > > > > > I really like Shahar's proposal and would love to build on it with a > > few > > > > > suggestions to make our expectations clear and supportive for our > > human > > > > > contributors: > > > > > > > > > > - Explicit Instructions: Let’s be very open in our templates and > > > > > AGENTS.md. We can instruct agents to pause and ask the human to > > write the > > > > > description, noting that this personal touch is essential for the PR > > to > > > > > stay open. > > > > > - Human Review Checkbox: I suggest adding a checkbox: "- [ ] I > > have > > > > > reviewed this code myself." We’ll instruct agents to leave this for > > the > > > > > human to check, ensuring that vital moment of reflection. > > > > > - Instead of copy-pasting (which I find awkward), we can instruct > > the > > > > > agents to use `gh --web`, `--template` (to include the template), and > > > > > `--draft` (following Pierre's idea). This creates natural > > > > > checkpoints—filling the description, checking the box, clicking > > submit, > > > > and > > > > > undrafting—that encourage human involvement. > > > > > > > > > > We should also state the consequences for non-compliance: To keep our > > > > queue > > > > > healthy, we should use an automated process to close PRs that miss > > these > > > > > steps, with a note explaining how to resubmit them with human input. > > > > > > > > > > All those expectations and closing etc. should be equally applied to > > all > > > > > PRs, including maintainer PRs. This will also allow those of us who > > use > > > > > agents to monitor the process and refine the instructions if we see > > any > > > > > loopholes that agents try to bypass or learn how to circumvent. This > > will > > > > > allow us to continuously improve the process. > > > > > > > > > > I believe this approach balances productivity with the high-quality > > human > > > > > collaboration we all value. > > > > > > > > > > What do you think? > > > > > > > > > > Best regards, > > > > > > > > > > Jarek > > > > > > > > > > > > > > > On Tue, Jun 9, 2026 at 5:00 PM Shahar Epstein <[email protected]> > > wrote: > > > > > > > > > >> Here's a more concrete suggestion: > > > > >> > > > > >> Updating the PR template in such a way that: > > > > >> 1. Human summary is now a MUST - at least a oneliner* (or more, > > > > depending > > > > >> on the scope - TBD) that describes the suggested changes written by > > the > > > > >> PR's author themselves (without AI assistance). > > > > >> 2. AI summary is optional. However, when included - it MUST be bound > > > > within > > > > >> a collapsible box, mainly to save cognitive load for maintainers and > > > > >> contributors, but also to encourage human interaction like we used > > to do > > > > >> before it all started. > > > > >> 3. PR's author (human) should be the one declaring the AI usage > > > > checkbox - > > > > >> added a short statement of ownership. > > > > >> > > > > >> Contributors will be instructed to use this template and adhere to > > the > > > > >> instructions when creating a PR. > > > > >> Agents may push branches to forks, but they will be instructed to > > avoid > > > > >> creating PRs on their own to the upstream repository, and instead > > > > provide > > > > >> the link for creating the PR using this template (they could > > suggest an > > > > AI > > > > >> summary, but the contributor should copy and paste it manually to > > the > > > > >> collapsible box). Trying to work around that might result in an M&M > > test > > > > >> directly in the PR's description (TBD). > > > > >> > > > > >> Example is available here < > > https://github.com/apache/airflow/pull/68055> > > > > - > > > > >> I've made HTML comments visible, they will be hidden in the real > > thing. > > > > >> > > > > >> Took inspiration for this idea from https://tenbluelinks.org/ , > > that > > > > hides > > > > >> the AI overview on Google if you're not interested > > (highly-recommended > > > > >> btw). > > > > >> > > > > >> Can we live with that? > > > > >> > > > > >> > > > > >> Shahar > > > > >> > > > > >> On Tue, Jun 9, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> > > > > wrote: > > > > >> > > > > >>> I don’t care one way or another about using AI as a tool in CI, > > that is > > > > >>> secondary to my goal which is to try and do something to make it > > clear > > > > >> what > > > > >>> we expect from people wanting to contribute to Airflow, namely: > > > > >>> > > > > >>> 1. Human involvement. > > > > >>> > > > > >>> By submitting a PR you are saying “yes I want to be a member of the > > > > >>> community”. Agents submitting without human interaction go against > > > > this. > > > > >>> > > > > >>> 2. Human ownership. > > > > >>> > > > > >>> It is _your responsibility_ as the PR author to follow up on it, > > > > address > > > > >>> comments, and request reviews. > > > > >>> > > > > >>> > > > > >>> I frankly find the AI generated triage comments verbose, and a > > waste > > > > of > > > > >>> time and pure noise even without the `@` spam. > > > > >>> > > > > >>> If the user doesn’t care enough about their own PR to follow up on > > it: > > > > >>> close it after some time. We don’t need to baby sit them. Nor do I > > need > > > > >> yet > > > > >>> more commit email messages to read through. > > > > >>> > > > > >>> > > > > >>> So how does it sound: It sounds like hell to me and an even bigger > > > > waste > > > > >>> of electricity in a climate crisis. > > > > >>> > > > > >>> I want to be involved in a community of humans working to build > > > > software. > > > > >>> I do not want to see LLMs producing so much output that other > > people > > > > need > > > > >>> LLMs to summarise it, with no humans looking at things. > > > > >>> > > > > >>> -ash > > > > >>> > > > > >>>> On 9 Jun 2026, at 13:18, Jarek Potiuk <[email protected]> wrote: > > > > >>>> > > > > >>>>> Why? Because AI “instructions” cannot be trusted. And I am after > > a > > > > >>> signal > > > > >>>> that people are blindly using LLMs without enough human > > introversion. > > > > >>>> > > > > >>>> But is not that what you are doing? This proposal is about adding > > > > >> another > > > > >>>> AI instruction (just hidden in HTML) - how is that going to help? > > > > >>>> > > > > >>>>> You already updated the instructions to not `@` the reviewer here > > > > >>>> Indeed, LLMs are not deterministic by nature. But they are > > improvable. > > > > >>>> Through iterations of refinement and adding more guardrails we can > > > > >>> improve > > > > >>>> it—and this is exactly why I am running it manually to make it > > better. > > > > >>> This > > > > >>>> is the same as in regular breeze development in the past. > > Initially, > > > > >>> there > > > > >>>> were many small issues - and I remember how you complained about > > them > > > > >> and > > > > >>>> how unnecessary they seemed—yet we now perfected it over time. > > Now, it > > > > >>>> allows all contributors and maintainers to work much more > > efficiently > > > > >> and > > > > >>>> lose less time. BTW. Thanks for notifying me; I must strengthen > > this > > > > >> one > > > > >>>> and see why, as there might be another improvement to implement. > > This > > > > >> is > > > > >>>> also why we are not "yet" doing CI analysis by AI - because I > > want to > > > > >>>> iterate on it and fix it in the way to know which parts are > > > > >>> deterministic. > > > > >>>>> I want to do anything and everything to reduce the drive by > > > > >> contribution > > > > >>>> with no human activity. I’m happy to spend my time helping > > humans, but > > > > >> if > > > > >>>> they are just going to feed that back to an LLM and burn an > > egregious > > > > >>>> amount of carbon: no thank you. > > > > >>>> > > > > >>>> And again I am not sure how the proposal to add that instruction > > would > > > > >>>> address this particular issue? Are you just proposing to add > > another > > > > >>>> instruction for the LLM (or am I wrong?). How does it solve the > > > > >> problem? > > > > >>>> From what I understand we have two basic proposals here - that > > > > >> contradict > > > > >>>> each other: > > > > >>>> > > > > >>>> * Ash - do not use AI to fight with AI at all > > > > >>>> * Amoght, Shahar - use AI in CI > > > > >>>> > > > > >>>> But I think, the triage I am running now shows a third way: > > > > >>>> > > > > >>>> * we use AI to try out and generate triage action and figure out > > which > > > > >>>> parts are practically 100% deterministic and can help with triage > > > > (this > > > > >>> is > > > > >>>> the stats I am gathering now) > > > > >>>> * qe use AI to convert the SKILLS we have into deterministic CI > > code > > > > >> that > > > > >>>> does those triage steps (no AI used at all at runtime) > > > > >>>> * we continue perfecting the manually-triggered AI SKILLS to get > > more > > > > >> AI > > > > >>>> heuristics that we can turn into deterministic CI code > > > > >>>> > > > > >>>> This seems to fulfill seemingly contradictory expectations that > > > > >> different > > > > >>>> people have in a nice way. I am about to produce stats from the > > last > > > > >> run > > > > >>>> and was just about to propose this approach. > > > > >>>> > > > > >>>> How does it sound Ash, Amogh, Shahar and others ? > > > > >>>> > > > > >>>> J. > > > > >>>> > > > > >>>> > > > > >>>> On Tue, Jun 9, 2026 at 12:55 PM Ash Berlin-Taylor <[email protected] > > > > > > > >>> wrote: > > > > >>>>> Why? Because AI “instructions” cannot be trusted. And I am after > > a > > > > >>> signal > > > > >>>>> that people are blindly using LLMs without enough human > > introversion. > > > > >>>>> > > > > >>>>> Want a prime example? > > > > >>>>> > > > > >>>>> The pr triage skill. > > > > >>>>> > > > > >>>>> You already updated the instructions to not `@` the reviewer here > > > > >>>>> > > > > >> > > > > > > https://github.com/apache/airflow-steward/blob/76cfa5e1d2e682b88df5205e9cda396df51a66b6/skills/pr-management-triage/comment-templates.md#reviewer-mention-policy > > > > >>>>>> When a comment's only addressee is the PR author (the > > > > >>>>> request-author-confirmation, reviewer-ping author-primary, and > > > > >>> review-nudge > > > > >>>>> author-primary templates), the body references the reviewer > > without > > > > >>>>> @-mentioning them > > > > >>>>> > > > > >>>>> And yet the LLM did it again: > > > > >>>>> > > https://github.com/apache/airflow/pull/66633#discussion_r3344849352 > > > > >>>>> > > > > >>>>>> @korex-f — A reviewer (@ashb) has requested changes on this PR, > > so > > > > >> I've > > > > >>>>> removed the ready for maintainer review label — the next step is > > on > > > > >> your > > > > >>>>> side. Could you address the review comments (push a fix, or reply > > > > >>> in-thread > > > > >>>>> explaining why the feedback doesn't apply)? Once addressed, > > > > re-request > > > > >>>>> review from @ashb or re-mark the PR ready and it returns to the > > > > >>> maintainer > > > > >>>>> queue. Thank you. > > > > >>>>> > > > > >>>>> And frankly I’m tired of all this shit. > > > > >>>>> > > > > >>>>> I want to do anything and everything to reduce the drive by > > > > >> contribution > > > > >>>>> with no human activity. I’m happy to spend my time helping > > humans, > > > > but > > > > >>> if > > > > >>>>> they are just going to feed that back to an LLM and burn an > > egregious > > > > >>>>> amount of carbon: no thank you. > > > > >>>>> > > > > >>>>> -ash > > > > >>>>> > > > > >>>>> > > > > >>>>>> On 9 Jun 2026, at 10:38, Jarek Potiuk <[email protected]> wrote: > > > > >>>>>> > > > > >>>>>> Hi Ash, Amogh, and Shahar, > > > > >>>>>> > > > > >>>>>> Ash, I'm curious to learn more about how the "brown m&m test" > > > > differs > > > > >>>>> from > > > > >>>>>> our current request for agents to identify themselves. Could you > > > > help > > > > >>> me > > > > >>>>>> understand the flow and the specific benefits you see? It feels > > > > >> similar > > > > >>>>> to > > > > >>>>>> me, but I'd love to hear your perspective in case I'm missing a > > > > >> nuance. > > > > >>>>>> Regarding the gh pr create --web approach, we included those > > > > >>> instructions > > > > >>>>>> to ensure we meet ASF legal guidelines for Gen-AI headers, and > > to > > > > >>> support > > > > >>>>>> contributors who might not have Copilot. That said, if you have > > > > ideas > > > > >>> on > > > > >>>>>> how to trim the context or improve the templates, we truly > > > > appreciate > > > > >>> PRs > > > > >>>>>> that improve them—and many people already have. AGENTS.md is a > > team > > > > >>>>> effort, > > > > >>>>>> and we’re always looking for ways to make it better. Let's keep > > our > > > > >>>>>> collaboration positive as we refine these processes together. > > > > >>>>>> > > > > >>>>>> Amogh and Shahar, yep the idea of an validatio step in the CI > > for > > > > >>>>>> first-time contributions is something we should implement > > sooner or > > > > >>>>> later. > > > > >>>>>> I have actually been gathering stats on this for the last two > > weeks. > > > > >>> I’ve > > > > >>>>>> been preparing to see how manually triggered triage tasks can > > turn > > > > >> into > > > > >>>>>> automated ones—I'm gathering stats on when human judgment is > > needed. > > > > >> I > > > > >>>>>> shared some stats about this recently and will continue > > gathering > > > > >> them. > > > > >>>>> The > > > > >>>>>> next step is discussing here what and how we can automate. > > > > >>>>>> > > > > >>>>>> Also, the current triage process already uses our Pull Request > > > > >> criteria > > > > >>>>> to > > > > >>>>>> pre-classify the PRs and only marks them with "ready for > > maintainer > > > > >>>>> review" > > > > >>>>>> if those criteria are met. So, if there are any specific > > criteria > > > > >> you’d > > > > >>>>>> like to see added to our "Pull request criteria," PRs are most > > > > >> welcome > > > > >>>>>> there as well. > > > > >>>>>> > > > > >>>>>> Best regards, > > > > >>>>>> > > > > >>>>>> Jarek > > > > >>>>> > > > > >>> > > > > >>> > > --------------------------------------------------------------------- > > > > >>> To unsubscribe, e-mail: [email protected] > > > > >>> For additional commands, e-mail: [email protected] > > > > >>> > > > > >>> > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
