Hi everyone, First, I want to take credit where it is due: I am very glad our ongoing discussions about automated code quality on this mailing list directly led to the community taking action to formalize agent instructions. Thank you to Wenchen for opening PR 54899 (SPARK-56074) to introduce AGENTS.md and CLAUDE.md to the repository .
I intentionally paused my replies to this thread over the last few weeks. I knew that arguing theory wouldn't get us anywhere, so I decided to wait and use this live PR as an experiment to definitively answer the questions raised by Dongjoon, Jungtaek, and Holden. Dongjoon and Jungtaek, you both mentioned that our manual, human-in-the-loop review process is enough to catch bad code, and that active PMC members using these productivity tools aren't making mistakes. Let's look at the actual data from PR 54899, which was recently merged and cherry-picked. This PR was incredibly small-exactly 83 lines of changes, with 76 additions and 7 deletions . It was highly visible and manually reviewed by 17 of our most senior core and PMC members, including Dongjoon, steveloughran, zhengruifeng, szehon-ho, HeartSaVioR, and others. Despite having 17 senior reviewers heavily analyzing a tiny 83-line text file, it slipped right through and shipped with critical structural bugs that actively break the automated tools the file was designed to guide: 1. The Dead-End Loop: While the file does contain some inline SBT commands higher up, the reference links section at the bottom explicitly tells automated tools to read docs/building-spark.md and look for the "Running Individual Tests" section to figure out how to test the code . However, if you actually look at that section in the documentation, it does not contain the execution commands;it just redirects you to the developer-tools.html web page . We just sent every automated tool into an infinite reading loop. 2. Missing Inline Scripts & Delegation: The PR's stated goal was to provide "inline build/test commands" rather than just linking to docs . Yet, the configuration completely omits the critical dev/connect-gen-protos.sh script required for Spark Connect testing . Instead, it delegates instructions to a subdirectory README (sql/connect/common/src/main/protobuf/) . This directly contradicts the PR's own architectural goal, forcing tools to go hunting through the directory tree for execution paths rather than giving them the actionable command upfront. If 17 of our most experienced PMC members missed these structural bugs on an 83-line plain text file, how are we going to catch them when contributors start submitting 1,500-5000 line PRs touching the core Catalyst optimizer? Human reviewers read code like humans; we simply do not catch the structural issues that trip up automated systems. A recent ETH Zurich study (arxiv 2602.11988 <https://arxiv.org/abs/2602.11988>) published in February proved exactly this: feeding automated tools bad or unnecessary context files actually increases inference costs by over 20% and reduces task success rates by 3% . Holden, you mentioned wanting to wait until we are actually impacted by a flood of automated slop before implementing checks. Unfortunately, we no longer have that time. On March 31, 2026, Anthropic accidentally leaked over 512,000 lines of their Claude Code TypeScript source via an npm source map error . The whole world now has the blueprint to build highly autonomous tools . Furthermore, Claude just ran a massive usage promotion doubling token limits that ended on March 28 , and committers are utilizing generous quotas on the Google Antigravity Ultra plan . In the next few months, we are going to see an absolute flood of machine-generated code hitting our queues. If we do not add the AIV gate now, I guarantee that within the next 1 year, our entire codebase is going to be completely full of these invisible, machine-breaking bugs. Jungtaek, to answer your concerns about accuracy and false positives, the AIV Gate uses deterministic AST parsing rather than subjective guessing. It acts as an objective linter, catching structural errors like missing inline bash blocks or dead-end references that humans miss. To compromise and ensure zero disruption to the current workflow, I propose we implement this in a Phased Approach: Phase 1 (Shadow Mode): We deploy the AIV Gate as a non-blocking CI job. It will simply flag PRs and append a JIRA label for "Automated Slop" or "Structural Error" based on its AST parsing. This gives us hard data on accuracy and volume without blocking a single merge. Phase 2 (Active Enforcement): Once the PMC reviews the Phase 1 data and we agree the false-positive rate is near zero, we graduate the gate to an active check that blocks violating code. Let's turn on Phase 1 and let the data speak for itself. Regards, Viquar Khan >
