Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Holden Karau Tue, 17 Mar 2026 15:32:25 -0700

I think for now we should probably avoid adding automated closing of
possible AI PRs, I think we are not as badly impacted (knock on wood) as
some projects and having a human in the loop for closing is reasonable. If
we start getting a bunch of seemingly openclaw generated PRs then we can
revisit this.


On Tue, Mar 17, 2026 at 3:07 PM Jungtaek Lim <[email protected]>
wrote:

> Maybe my biggest worry for this kind of attempt is the accuracy. If this
> gives false positives, this will just add overhead on the review phase
> pushing the reviewer to check the validation manually, which is
> "additional" overhead. I wouldn't be happy with it if I get another phase
> in addition to the current review process.
>
> We get AI slop exactly because of the accuracy. How is this battle tested?
> Do you have a proof of the accuracy? Linter failures are almost obvious and
> there are really rare false positives (at least I haven't seen it), so I
> don't bother with linter checking. I would bother with an additional
> process if that does not guarantee (or at least has a sense of) the
> accuracy.
>
> On Wed, Mar 18, 2026 at 6:23 AM vaquar khan <[email protected]> wrote:
>
>> Hi Team,
>>
>>  Nowadays a really hot topic in all Apache Projects is AI and I wanted to
>> kick off a discussion around a new SPIP.I've been putting together. With
>> the sheer volume of contributions we handle, relying entirely on PR
>> templates and manual review to filter out AI-generated slop is just burning
>> out maintainers. We've seen other projects like curl and Airflow get
>> completely hammered by this stuff lately, and I think we need a hard
>> technical defense.
>>
>> I'm proposing the Automated Integrity Validation (AIV) Gate. Basically,
>> it's a local CI job that parses the AST of a PR (using Python, jAST, and
>> tree-sitter-scala) to catch submissions that are mostly empty scaffolding
>> or violate our specific design rules (like missing.stop() calls or using
>> Await.result).
>>
>> To keep our pipeline completely secure from CI supply chain attacks, this
>> runs 100% locally in our dev/ directory;zero external API calls.  If the
>> tooling ever messes up or a committer needs to force a hotfix, you can just
>> bypass it instantly with a GPG-signed commit containing '/aiv skip'.
>>
>> I think the safest way to roll this out without disrupting anyone's
>> workflow is starting it in a non-blocking "Shadow Mode" just to gather data
>> and tune the thresholds.
>>
>> I've attached the full SPIP draft below which dives into all the
>> technical weeds, the rollout plan, and a FAQ. Would love to hear your
>> thoughts!
>>
>>
>> https://docs.google.com/document/d/1-PCSq0PT_B45MbXVxkJ_E3GUHvK-8VV6WxQjKSGEh9o/edit?tab=t.0#heading=h.e8ahm4jtqclh
>>
>> --
>> Regards,
>> Viquar Khan
>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
>> *Book *-
>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
>> *GitBook*-
>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
>> *github*-https://github.com/vaquarkhan/aiv-integrity-gate
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her

Re: SPIP: Automated Integrity Validation (AIV) Gate for Apache Spark

Reply via email to