Hi all, I want to share how Apache Airflow is handling this, since they're dealing with the same volume problem.
Rather than building detection for AI-generated PRs specifically, they've focused on raising the quality bar for all non-collaborator contributions and automating the enforcement. The discussion and tooling could provide inspiration: https://lists.apache.org/thread/8tzwwwd7jmtmfo4j9pzg27704g10vpr4 https://github.com/apache/airflow/pull/62682 PRs from non-collaborators must pass all checks, follow PR guidelines (LLM-verified), and include proper descriptions and tests before any maintainer looks at them. PRs that don't meet the bar get converted to drafts automatically with actionable comments. A human triager reviews the automated output, but the responsibility sits entirely with the author. I don't see it as all that different from the goals of this SPIP. Their Gen-AI disclosure policy layers on top of this: https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions Could be a useful model as the community weighs what levels of enforcement are available. -- LNC On Wed, Mar 18, 2026, 1:48 AM Jungtaek Lim <[email protected]> wrote: > Hi Vaquar, > > I do not see a value in coupling this with Apache Spark. If this is useful > for Apache Spark, why is this particularly useful only for Apache Spark? It > shouldn't be too hard for you to run the prototype with existing/new PRs > over various OSS projects. Apache Spark project is too restricted to prove > your project because nowadays code contributors are rather almost bound - > we are not running the project which is quite new and shiny to gain > traction from random contributors. I don't feel like we should take the > approach of shadow mode while it is not really necessary. There is an > existing way to prove the value; go with a faster loop on your own project > first. > > There is no actual relation between this and Apache Spark "from the > product point of view". You should become more successful when you prove > the value with the project. Please incubate properly and in the right > direction. > > > On Wed, Mar 18, 2026 at 5:26 PM vaquar khan <[email protected]> wrote: > >> Hi Jungtaek, >> >> Thank you for these points. Your concern regarding *accuracy and >> reviewer overhead* is perhaps the most impactful feedback I’ve received >> so far. I completely agree: if an automated tool has a high false-positive >> rate, it creates a "validation tax" that makes a reviewer's job harder, not >> easier. >> >> Because your questions get to the heart of the proposal’s viability, I >> have specifically documented the answers and data regarding accuracy and >> your "validate before integrate" suggestion directly into the SIP: *[Link >> to SIP: PR Quality & AI-Generated Content Policy]*. >> >> To summarize the strategy I've outlined there to address your concerns: >> >> 1. >> >> *The "Linter" Strategy:* We are not using subjective "guesses" to >> identify AI. We are looking for objective metadata violations;such as >> missing JIRA IDs, ignored PR templates, and specific automated signatures. >> These are "binary" failures with a near-zero false-positive rate, much >> like >> a code linter. >> 2. >> >> *Shadow Mode (Validation without Integration):* To your point about >> figuring out the value first, I propose we run this logic in *Shadow >> Mode*. It would run as a non-blocking background process to collect >> accurate data on Spark PRs for a set period (e.g., 4 weeks). This allows >> us >> to prove the value and measure the false-positive rate without adding a >> single second of overhead to your current review process. >> 3. >> >> *Proactive vs. Reactive:* While testing on other projects is >> possible, Spark’s unique standards mean we need Spark-specific data. This >> proactive approach ensures we have the tools ready before the volume of >> "AI >> slop" becomes a crisis. >> >> I’ve made sure the SIP now reflects that the goal of this tool is to act >> as a *shield* for committers, not a new hurdle. I’d value your thoughts >> on the "Shadow Mode" data collection as a way to provide the proof of >> accuracy you’re looking for. >> >> Please read the details in the SIP doc with your name. >> >> Best regards, >> >> Viquar Khan >> >> On Wed, 18 Mar 2026 at 03:17, vaquar khan <[email protected]> wrote: >> >>> Hi Holden, >>> >>> I appreciate the perspective on keeping a human in the loop. However, >>> relying on "massive examples" as a lagging indicator means we only act once >>> maintainers are already overwhelmed. Data across the ecosystem shows that >>> the transition from a manageable queue to an unmanageable flood happens >>> rapidly; if Spark is not heavily impacted today, the trajectory of sibling >>> projects suggests we will be within 6 months. >>> >>> The "human in the loop" approach is already costing us time. We are >>> seeing drive-by AI contributions that bypass our soft controls and require >>> manual intervention to close. For example: >>> >>> - >>> >>> *Large-Scale Noise:* PR #52218 >>> <https://github.com/apache/spark/pull/52218> introduced 1,151 lines >>> of a RabbitMQ connector explicitly marked as "Generated-by: ChatGPT-5," >>> lacking tests and ignoring architectural standards. >>> - >>> >>> *Duplicate Overhead:* PR #54810 >>> <https://github.com/apache/spark/pull/54810> and PR #54717 >>> <https://github.com/apache/spark/pull/54717> are concrete instances >>> of AI-driven duplicate PRs for the same JIRA ticket, showing a lack of >>> context awareness. >>> - >>> >>> *Template Evasion:* PR #54150 >>> <https://github.com/apache/spark/pull/54150> and PR #50400 >>> <https://github.com/apache/spark/pull/50400> completely ignored JIRA >>> IDs and PR templates without disclosing AI usage. This proves the >>> voluntary >>> checkbox is an unreliable metric for the true volume of AI code entering >>> the repo. >>> >>> It is important to distinguish this "AI slop" from high-quality, >>> productive AI use. As I mentioned, PR #54300 >>> <https://github.com/apache/spark/pull/54300> from *Dongjoon Hyun* >>> (using Gemini 3 Pro on Antigravity) is a perfect example of how AI should >>> be used—with PMC-level oversight and intent. >>> >>> I have documented these emerging patterns in the SIP. If we look at the >>> data, it is clear we are moving toward the same crisis seen in other >>> projects. This proposal is a *proactive approach* to protect our >>> committers’ bandwidth before the flood arrives, rather than a *reactive* >>> one that forces us to scramble once the review queue is already broken. >>> >>> If a full "auto-close" feels too aggressive right now, could we at least >>> implement *automated labeling* based on these SIP patterns to reduce >>> "discovery time" for the PMC? >>> >>> Regards, >>> >>> Viquar Khan >>> >>> On Wed, 18 Mar 2026 at 03:08, vaquar khan <[email protected]> wrote: >>> >>>> Hi everyone, >>>> >>>> Thank you all for taking the time to review and respond to my email, >>>> especially on what I know is a busy Monday. >>>> >>>> Before diving into the specifics, I want to share a bit of my >>>> background. I am an AI developer building various AI products, which gives >>>> me a clear perspective on both its pros and cons. I am a strong advocate >>>> for using AI and rely on it heavily in my day-to-day life. >>>> >>>> On that note, I was happy to see our PMC member, Dongjoon Hyun—who >>>> requested evidence—is also actively utilizing AI. Specifically, PR #54300 >>>> uses "Gemini 3 Pro (High) on Antigravity" (GitHub Link >>>> <https://github.com/apache/spark/pull/54300>). I want to emphasize >>>> that this is perfectly acceptable; it is a great example of productive AI >>>> use rather than "AI slop." >>>> >>>> *Because there are many questions to cover, I won't overwhelm you by >>>> answering them all in a single thread. Instead, I will send multiple >>>> follow-up emails to ensure I address each point thoroughly. For a few of >>>> the more complex questions, the answers were quite long, so I have >>>> documented them directly in the SIP.* >>>> >>>> Thanks again for your time and feedback >>>> >>>> Regards, >>>> >>>> Viquar Khan >>>> >>>> On Tue, 17 Mar 2026 at 18:10, Jungtaek Lim < >>>> [email protected]> wrote: >>>> >>>>> Personally I would love to ask Vaquar to run the idea against OSS >>>>> projects and figure out the value, rather than trying to integrate first >>>>> and validate. I do not see a limitation to run the idea without actual >>>>> integration - the only issue is the cost, but I hope he can get some help >>>>> from his employer if this is ever useful. While it will take multiple >>>>> months to collect the useful info from Apache Spark, it shouldn't need >>>>> multiple months if it's expanded to so many OSS projects and it will be >>>>> much more useful than trying to frame that Apache Spark project would need >>>>> this. >>>>> >>>>> On Wed, Mar 18, 2026 at 7:32 AM Holden Karau <[email protected]> >>>>> wrote: >>>>> >>>>>> I think for now we should probably avoid adding automated closing of >>>>>> possible AI PRs, I think we are not as badly impacted (knock on wood) as >>>>>> some projects and having a human in the loop for closing is reasonable. >>>>>> If >>>>>> we start getting a bunch of seemingly openclaw generated PRs then we can >>>>>> revisit this. >>>>>> >>>>>> On Tue, Mar 17, 2026 at 3:07 PM Jungtaek Lim < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Maybe my biggest worry for this kind of attempt is the accuracy. If >>>>>>> this gives false positives, this will just add overhead on the review >>>>>>> phase >>>>>>> pushing the reviewer to check the validation manually, which is >>>>>>> "additional" overhead. I wouldn't be happy with it if I get another >>>>>>> phase >>>>>>> in addition to the current review process. >>>>>>> >>>>>>> We get AI slop exactly because of the accuracy. How is this battle >>>>>>> tested? Do you have a proof of the accuracy? Linter failures are almost >>>>>>> obvious and there are really rare false positives (at least I haven't >>>>>>> seen >>>>>>> it), so I don't bother with linter checking. I would bother with an >>>>>>> additional process if that does not guarantee (or at least has a sense >>>>>>> of) >>>>>>> the accuracy. >>>>>>> >>>>>>> On Wed, Mar 18, 2026 at 6:23 AM vaquar khan <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> Nowadays a really hot topic in all Apache Projects is AI and I >>>>>>>> wanted to kick off a discussion around a new SPIP.I've been putting >>>>>>>> together. With the sheer volume of contributions we handle, relying >>>>>>>> entirely on PR templates and manual review to filter out AI-generated >>>>>>>> slop >>>>>>>> is just burning out maintainers. We've seen other projects like curl >>>>>>>> and >>>>>>>> Airflow get completely hammered by this stuff lately, and I think we >>>>>>>> need a >>>>>>>> hard technical defense. >>>>>>>> >>>>>>>> I'm proposing the Automated Integrity Validation (AIV) Gate. >>>>>>>> Basically, it's a local CI job that parses the AST of a PR (using >>>>>>>> Python, >>>>>>>> jAST, and tree-sitter-scala) to catch submissions that are mostly empty >>>>>>>> scaffolding or violate our specific design rules (like missing.stop() >>>>>>>> calls >>>>>>>> or using Await.result). >>>>>>>> >>>>>>>> To keep our pipeline completely secure from CI supply chain >>>>>>>> attacks, this runs 100% locally in our dev/ directory;zero external API >>>>>>>> calls. If the tooling ever messes up or a committer needs to force a >>>>>>>> hotfix, you can just bypass it instantly with a GPG-signed commit >>>>>>>> containing '/aiv skip'. >>>>>>>> >>>>>>>> I think the safest way to roll this out without disrupting anyone's >>>>>>>> workflow is starting it in a non-blocking "Shadow Mode" just to gather >>>>>>>> data >>>>>>>> and tune the thresholds. >>>>>>>> >>>>>>>> I've attached the full SPIP draft below which dives into all the >>>>>>>> technical weeds, the rollout plan, and a FAQ. Would love to hear your >>>>>>>> thoughts! >>>>>>>> >>>>>>>> >>>>>>>> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Viquar Khan >>>>>>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ >>>>>>>> *Book *- >>>>>>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true >>>>>>>> *GitBook*- >>>>>>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ >>>>>>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan >>>>>>>> *github*-https://github.com/vaquarkhan/aiv-integrity-gate >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> Pronouns: she/her >>>>>> >>>>>
