Indeed -  but this does not reflect the final numbers yet. It also misses
the fact that 200 of those are already draft (I converted a 100 of them are
least) and each of them has detailed instructions for the authors what to
do. At least 10 people thanked me for providing such detailed and guidance.


Some 30-40 so far are `ready for maintainer review` (label). But I have not
run the tool on all of the open PRs only on mine, dev-area and providers.

i am still iterating and I proving the tool (with Yeongook help - he also
run some of the triage - not only me.

There are still some things to complete - I implemented a very strong
security layer to completely sandbox and isolate the LLMs and prevent PR
prompt injection attacks (those are real things nowadays)
https://github.com/apache/airflow/pull/63422

And I am working on a separate `review` mode - which will allow maintainers
equally efficiently review already good PRs (the ones marked with 'ready
for maintainer review` label  With the same approach - deterministic checks
for speed + very targeted LLM assistance - but keep that human in the loop
and maintainers in the driving seat. No comment, no message, no assessment
posted to the contributor without conscious decision of the maintainer.

I am looking at responses of people and have some small improvements on the
way.

I am also implementing some of the small workflows I see as current
patterns in reviews of others - and I hope by early next week I will have a
completely working and battle tested solution.

I think that with the tool we will be able to handle easily (and I am not
exaggerating at all) at the very least two orders of magnitude more PR
traffic that we see right now - especially when more of us start using it
and when we share the triage/review burden (very, very low for triage
part). among more maintainers.

I was hoping to demo it today at dev call - but I did not realize I am
getting back to Warsaw from Slovakia today - and it is unlikely I will be
able to share demo  - istll might be on/off at the call but in demoable
circumstances likely - I might try but It's not likely - but I will create
a detailed description of the tool, how to use it and proposed process and
will record a screencast (likely weekend) demoing how it works and will
share with everyone.

I am super optimistic that wei will be able to solve the PR problem this
way, and that we will be able to apply similar approach to issues and later
also security reports. Smartly combining humans as driver's, deterministic
(though AI-generated) code + LLMs as the additional 'intelligent
assistant's for things that cannot be done deterministically seems to be
working beautifully.

J

On Thu, Mar 12, 2026, 16:28 Vincent Beck <[email protected]> wrote:

> Pretty impressive results, we were at 500+ open PRs 2 days ago and now we
> are at ~430 open PRs. Bravo!
>
> On 2026/03/11 14:51:36 Kevin Yang wrote:
> > Thanks for the feedback! More than happy if I could implement these
> options
> > and integrations. I will look into the current implementation and draft
> PRs
> > by the upcoming week.
> >
> > Best,
> > Kevin Yang
> >
> > On Wed, Mar 11, 2026 at 4:05 AM Jarek Potiuk <[email protected]> wrote:
> >
> > > You can absolutely add the  option to use any agent or model to the
> tool I
> > > created. Currently it can use copilot, Claude, codex - but you can add
> PR
> > > to use any model - it is build for that purpose.
> > >
> > > This is integrated with breeze uatctually even automatically stores
> which
> > > model you use and continue using it. The interface to LLm ia super
> Simple.
> > > It does not even use Pydantic AI - it just generates prompt and parses
> the
> > > output. so by all means - adding a way to use any other LLM.
> > >
> > > 90% of the work done by the tool is deterministic; it only asks the LLM
> > > when it is in doubt.
> > >
> > > So - by all means, PRs to use any other LLMs - whether local or remote
> -
> > > are most welcome. Also we can add opencode and ollama integration
> > >
> > > [image: image.png]
> > >
> > > J.
> > >
> > > On Wed, Mar 11, 2026, 03:32 Kevin Yang <[email protected]> wrote:
> > >
> > >> Hi Jarek,
> > >>
> > >> Thank you very much for all the efforts in building the solutions. I
> > >> recently also read through the following discussions [1,2,3], and
> think
> > >> about whether there is a good approach on tackling the challenge.
> > >>
> > >> I believe integrating with LLM is a good approach, especially can
> leverage
> > >> its reasoning capabilities to provide a better triage. Existing
> products
> > >> such as Copilot Code Review can also provide insightful triage as
> > >> previously proposed by Kaxil.
> > >>
> > >> I also find another direction that also looks promising to me is to
> > >> use a *small
> > >> language model (SLM)*, a model with 2-4 B parameters, which can be
> run on
> > >> standard Github runners, using CPU-only, to triage issues and PRs.
> I've
> > >> built a github action *SLM Triage* (
> > >> https://github.com/marketplace/actions/slm-triage).
> > >>
> > >> What advantages does SLM offer?
> > >> * It can be run on a standard GitHub runner, on CPU, and finish
> execution
> > >> in around 3 - 5 minutes
> > >> * There is no API cost, billing set up with LLM service
> > >> * It runs on GitHub events, when an issue or PR is opened, and
> capable to
> > >> triage issues as long as there are GitHub runners available
> > >> * It can be simply integrated into GitHub Actions without
> infrastructure,
> > >> or local setup.
> > >>
> > >> What are the current limitations?
> > >> * It doesn't have enough domain knowledge about a specific codebase,
> so it
> > >> can only triage based on high-level context, and relevancy between
> context
> > >> information and code changes
> > >> * It has limited reasoning capability
> > >> * It has limited context window (128k context window size, some might
> have
> > >> ~256k)
> > >>
> > >> Why I think it can be a potential direction
> > >> * I feel some issues or PRs can be triage based on some basic
> heuristics
> > >> and rules
> > >> * Even though context window is limited, if the process is triggered
> when
> > >> issue opened, the context window is good enough to capture issue
> > >> description, pr description, and even code change
> > >> * It is easier to set up for a broader open-source community, and
> probably
> > >> more cost efficient, it can scale based on workflow adoption
> > >> * It can take action through API such as comment on an issue, add
> label,
> > >> close an issue or PR, etc. based on the triage result.
> > >>
> > >> I also attempted to triage multiple issues and PRs on airflow
> repository,
> > >> and check the actual issues/PRs (I created a script to dry-run and
> inspect
> > >> the triage result and reasoning). The result looks promising, but
> > >> sometimes
> > >> I found it is "a bit strict" and needs some improvements in terms of
> > >> prompting.
> > >>
> > >> I wonder if this is a valid idea, but it would be great if the idea
> can
> > >> potentially help.
> > >>
> > >> Thanks,
> > >> Kevin Yang
> > >>
> > >> [1] https://github.com/orgs/community/discussions/185387
> > >> [2] https://github.com/ossf/wg-vulnerability-disclosures/issues/178
> > >> [3]
> > >>
> > >>
> https://www.reddit.com/r/opensource/comments/1q3f89b/open_source_is_being_ddosed_by_ai_slop_and_github/#:~:text=FunBrilliant5713-,Open%20source%20is%20being%20DDoSed%20by%20AI%20slop%20and%20GitHub,which%20submissions%20came%20from%20Copilot
> > >> .
> > >>
> > >> On Tue, Mar 10, 2026 at 9:13 PM Jarek Potiuk <[email protected]>
> wrote:
> > >>
> > >> > Just to update everyone: I've auto-triaged a bunch of PRs—the tool
> works
> > >> > very well IMHO, but we will know after the authors see them and
> review
> > >> >
> > >> > Some stats (I will gather more in the next days as I am adding
> timing
> > >> and
> > >> > further improvements):
> > >> >
> > >> > * I triaged about 100 PRs in under an hour of elapsed time (I
> > >> > also corrected, improved and noted some fixes, so it will be faster)
> > >> > * I converted 30 of those into Drafts and closed a few
> > >> > * I have not marked any as ready to review yet, but I will do that
> > >> tomorrow
> > >> > * The LLM (Claude) assessment is quite fast - faster than I thought.
> > >> > Parallelizing it also helps. LLM assessment takes between 20 s and 2
> > >> > minutes (elapsed), but usually, only a few pull requests (15% or
> less)
> > >> are
> > >> > LLM assessed  in a batch, so this is not a bottleneck. I will also
> > >> modify
> > >> > the tool to start reviewing deterministic things before LLMs
> complete -
> > >> > which should speed up the whole process even more
> > >> > * The LLM assessments are pretty good - but a few were significantly
> > >> wrong
> > >> > and I would not post them. It's good we have Human-In-The-Loop and
> in
> > >> the
> > >> > driver's seat.
> > >> >
> > >> > Overall - I think the tool is doing very well what I wanted. But
> let's
> > >> see
> > >> > the improvements over the next few days, observe how authors react,
> and
> > >> > determine if it can actually help maintainers
> > >> >
> > >> > I added a few PRs as improvements; looking forward to reviews, :
> > >> >
> > >> > * https://github.com/apache/airflow/pull/63318
> > >> > * https://github.com/apache/airflow/pull/63317
> > >> > * https://github.com/apache/airflow/pull/63315
> > >> > * https://github.com/apache/airflow/pull/63319
> > >> > * https://github.com/apache/airflow/pull/63320
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Mar 10, 2026 at 10:18 PM Jarek Potiuk <[email protected]>
> wrote:
> > >> >
> > >> > > Lazy consensus reached. I will try it out tonight. I added more
> > >> signals
> > >> > > (unresolved review comments)  and filtering options (
> > >> > > https://github.com/apache/airflow/pull/63300) that will be useful
> > >> during
> > >> > > this phase.
> > >> > >
> > >> > > On Fri, Mar 6, 2026 at 9:08 PM Jarek Potiuk <[email protected]>
> wrote:
> > >> > >
> > >> > >> Hello here,
> > >> > >>
> > >> > >> I am asking a lazy consensus on the approach proposed in
> > >> > >> https://lists.apache.org/thread/ly6lrm2gc4p7p54vomr8621nmb1pvlsk
> > >> > >> regarding our approach to triaging PRs.
> > >> > >>
> > >> > >> The lazy consensus will last till  Tuesday 10 pm CEST (
> > >> > >>
> > >> >
> > >>
> https://www.timeanddate.com/countdown/generic?iso=20260310T22&p0=262&font=cursive
> > >> > >> )
> > >> > >>
> > >> > >> Summary of the proposal
> > >> > >>
> > >> > >> This is the proposed update to the PR contributing guidelines:
> > >> > >>
> > >> > >> > Start with **Draft**: Until you are sure that your PR passes
> all
> > >> the
> > >> > >> quality checks and tests, keep it in **Draft** status. This will
> > >> signal
> > >> > to
> > >> > >> maintainers that the PR is not yet ready for review and it will
> > >> prevent
> > >> > >> maintainers from accidentally merging it before it's ready. Once
> you
> > >> are
> > >> > >> sure that your PR is ready for review, you can mark it as "Ready
> for
> > >> > >> review" in the GitHub UI. Our regular check will convert all PRs
> from
> > >> > >> non-collaborators that do not pass our quality gates to Draft
> status,
> > >> > so if
> > >> > >> you see that your PR is in Draft status and you haven't set it to
> > >> Draft.
> > >> > >> Check the comments to see what needs to be fixed.
> > >> > >>
> > >> > >> That's a "broad" description of the process; details will be
> worked
> > >> out
> > >> > >> while testing the solution.
> > >> > >>
> > >> > >> The PR: https://github.com/apache/airflow/pull/62682
> > >> > >>
> > >> > >> My testing approach is to start with individual areas, update and
> > >> > perfect
> > >> > >> the tool, gradually increase the reach of it and engage others -
> > >> then we
> > >> > >> might think about more regular process involving more
> maintainers.
> > >> > >>
> > >> > >> J.
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to