Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Dongjoon Hyun Tue, 17 Mar 2026 14:44:35 -0700

Hi Viquar,

Thank you for sharing this.


While reviewing the SPIP, I noticed that we might need more concrete data to 
support the claims regarding the recent surge in the Apache Spark community, 
specifically this section:

> Why Now: The Open Source Automated Contribution Crisis: The open-source 
> ecosystem is experiencing an unprecedented surge in automated, low-quality 
> pull requests. This is not a theoretical concern—it is an active, documented 
> crisis affecting Apache projects and the broader community:
> Apache Spark's Own Data (Verified from Commit History): Spark added a 
> generative tooling disclosure checkbox to its PR template on August 19, 2023. 
> Analysis of commit history shows machine-assisted commits accelerating: 9 in 
> 2024, 23 in 2025, and 35 in just the first 45 days of 2026. Only ~1-2% of 
> commits currently disclose automated tooling usage, but disclosure is 
> voluntary and unverifiable; the actual percentage is likely much higher.

Just FYI, please note that the recent `Generated-By: ` commits came from active 
Apache Spark PMC members (like me, Kent, Yang) mostly. It's because of the 
recent promotion from the vendors (like Claude Code OSS program, Google 
Antigravity Ultra Plan Discount, and Copilot). It's truly the productivity 
enhancements instead of the attack of AI slops.

Additionally, as a point of context, our community has already taken proactive 
measures to safeguard against low-quality AI-generated contributions. We 
currently maintain a human-in-the-loop system—such as requiring an ASF JIRA 
ticket to be created before submitting a PR—to help mitigate this issue.

So, we may want to revisit those topic later with the concrete and massive 
examples of AI Slops in the Spark Pull Request list.

Sincerely,
Dongjoon Hyun


On 2026/03/17 21:22:55 vaquar khan wrote:
> Hi Team,
> 
>  Nowadays a really hot topic in all Apache Projects is AI and I wanted to
> kick off a discussion around a new SPIP.I've been putting together. With
> the sheer volume of contributions we handle, relying entirely on PR
> templates and manual review to filter out AI-generated slop is just burning
> out maintainers. We've seen other projects like curl and Airflow get
> completely hammered by this stuff lately, and I think we need a hard
> technical defense.
> 
> I'm proposing the Automated Integrity Validation (AIV) Gate. Basically,
> it's a local CI job that parses the AST of a PR (using Python, jAST, and
> tree-sitter-scala) to catch submissions that are mostly empty scaffolding
> or violate our specific design rules (like missing.stop() calls or using
> Await.result).
> 
> To keep our pipeline completely secure from CI supply chain attacks, this
> runs 100% locally in our dev/ directory;zero external API calls.  If the
> tooling ever messes up or a committer needs to force a hotfix, you can just
> bypass it instantly with a GPG-signed commit containing '/aiv skip'.
> 
> I think the safest way to roll this out without disrupting anyone's
> workflow is starting it in a non-blocking "Shadow Mode" just to gather data
> and tune the thresholds.
> 
> I've attached the full SPIP draft below which dives into all the technical
> weeds, the rollout plan, and a FAQ. Would love to hear your thoughts!
> 
> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh
> 
> -- 
> Regards,
> Viquar Khan
> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
> *Book *-
> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
> *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
> *github*-https://github.com/vaquarkhan/aiv-integrity-gate
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Reply via email to