> “For each finding, fork a subagent generating unit or fuzz tests to establish > the veracity of each finding. Merge results, and eliminate findings that are > not demonstrable in practice explaining why.”
My review skill did this after getting too many false positives with concurrency. I scoped it to only look at concurrency issues and the rate drastically dropped; I still get things flagged but only after the agent double checks it. When I was going on paternity leave I was asked to review a patch Friday before I left; I didn’t have time so had the agent review and then did what you just said, asked it to create tests to prove its points; I ended up submitting the failing test as the review before I left. > On Jun 9, 2026, at 11:52 PM, C. Scott Andreas <[email protected]> wrote: > > This is an indirect answer, but for my own use of /deep-review I typically > follow the invocation with: “For each finding, fork a subagent generating > unit or fuzz tests to establish the veracity of each finding. Merge results, > and eliminate findings that are not demonstrable in practice explaining why.” > > This has been very helpful toward pruning false positives, though it’s > possible it may have a minor adverse effect on recall. > > Found a bug in high-scale-lib’s ConcurrentAutoTable#size() with it the other > day. :) > > – Scott > >> On Jun 9, 2026, at 11:13 PM, Alex Petrov <[email protected]> wrote: >> >> >> I would really appreciate if folks would give feedback: >> False positives (just post verbatim, if you can post an SHA, I can try >> figuring out why they got flagged) >> False negatives (harder to spot, but if you have a human cycle after llm, >> you know what it didn’t find) >> True positives (mostly to confirm this is actually useful) >> This way I can further fine tune and improve. >> >> On Tue, Jun 9, 2026, at 5:41 PM, Štefan Miklošovič wrote: >>> Thanks Alex for the latest update on your branch! I have merged it >>> (CASSANDRA-21373). >>> >>> David, would you mind driving that PR of yours to the merge? That >>> looks very handy as well. >>> >>> On Fri, Jun 5, 2026 at 12:22 PM Štefan Miklošovič >>> <[email protected] <mailto:[email protected]>> wrote: >>> > >>> > I am going to merge the PR of Alex next week. Whole week no progress / >>> > changes so lets merge what we have. >>> > >>> > On Thu, May 28, 2026 at 10:38 AM Štefan Miklošovič >>> > <[email protected] <mailto:[email protected]>> wrote: >>> > > >>> > > Hi Alex, >>> > > >>> > > Has the situation around your skills improved in relation to what you >>> > > have described or can we move forward with it already? >>> > > >>> > > I think it is better to have something in rather than trying to >>> > > perfect it on the first merge. The skills are useful as they are >>> > > already and they can be calibrated in the future. >>> > > >>> > > Regards >>> > > >>> > > On Fri, May 15, 2026 at 6:58 PM Alex Petrov <[email protected] >>> > > <mailto:[email protected]>> wrote: >>> > > > >>> > > > It performs poorly on larger patches, so I was trying to chunk it. I >>> > > > was also experimenting with reverse checklists: you generate a review >>> > > > checklist per patch and take skill as an input inspiration. Kind of >>> > > > semgrep rules but you encode them verbally. >>> > > > >>> > > > On Fri, May 15, 2026, at 4:37 PM, Maxim Muzafarov wrote: >>> > > > >>> > > > As for large patches used to test new skills, I think the “CEP-38: CQL >>> > > > Management API” PR ( https://github.com/apache/cassandra/pull/4582 ) >>> > > > could be a good playground to validate the relevance and accuracy of >>> > > > the suggestions provided by the deep-review and patch-explainer >>> > > > skills. >>> > > > >>> > > > (By the way, we still need a reviewer to move this patch forward.) >>> > > > >>> > > > I used patch-explainer to generate a description. This is what it >>> > > > looks like: >>> > > > https://github.com/Mmuzaf/cassandra/blob/cassandra-19476-bug-hunting/CASSANDRA-19476-PR-DESCRIPTION.md >>> > > > >>> > > > Thoughts, >>> > > > >>> > > > I think it would be useful to explicitly mention a strategy to split >>> > > > large patches into some reviewable parts, for example by logically >>> > > > separating them by component. There is already a “Skip or minimize” >>> > > > section, but it does not mention breaking large patches into blocks >>> > > > (if it's possible). The skill currently does not mention trade-offs, >>> > > > although during implementation I constantly kept them in mind and even >>> > > > tracked them separately in my notes for each critical section. For >>> > > > example, what is actually preferable: issuing a direct command QUERY >>> > > > request or invoking pre-registered prepared statements? >>> > > > >>> > > > I also experimented with Mermaid diagrams (1) instead of ASCII >>> > > > diagrams. This is how they could look (2) and looks better than the >>> > > > text, although I noticed they tend to be less accurate. >>> > > > >>> > > > >>> > > > I also tested deep-review, and although I had already used Claude to >>> > > > review my changes, it still highlighted several issues that need to be >>> > > > fixed: >>> > > > https://github.com/Mmuzaf/cassandra/blob/cassandra-19476-bug-hunting/CEP-38_DEEP_REVIEW.md >>> > > > >>> > > > Overall, I think it’s good. >>> > > > Could you share any deficiencies you’ve spotted, Alex? >>> > > > >>> > > > >>> > > > [1] https://de.wikipedia.org/wiki/Mermaid_(Software) >>> > > > [2] >>> > > > https://github.com/Mmuzaf/cassandra/blob/cassandra-19476-bug-hunting/CASSANDRA-19476-PR-DESCRIPTION-MERMAID.md >>> > > > >>> > > > >>> > > > On Fri, 15 May 2026 at 09:18, Alex Petrov <[email protected] >>> > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > I have spotted some deficiencies, particularly when reviewing large >>> > > > > patches. I have an experiment running that might improve the >>> > > > > situation. I’ll report as soon I have a result. >>> > > > > >>> > > > > On Thu, May 14, 2026, at 12:31 PM, Štefan Miklošovič wrote: >>> > > > > >>> > > > > I just merged (1) and created (2) for tracking the patch of Alex. >>> > > > > (1) and (2) don't collide. >>> > > > > >>> > > > > It would be cool to include this (2) in upcoming weeks, let's just >>> > > > > live with what Alex provided for a while to evaluate that set of >>> > > > > skills. If the general vibe is OK I would approach the merge. Let's >>> > > > > give it what ... few weeks? Until the end of the month at least. >>> > > > > >>> > > > > (1) https://issues.apache.org/jira/browse/CASSANDRA-21301 >>> > > > > (2) https://issues.apache.org/jira/browse/CASSANDRA-21373 >>> > > > > >>> > > > > On Mon, May 11, 2026 at 3:21 PM Štefan Miklošovič >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > BTW I really appreciate TLA+ machinery in that patch, I let it scan >>> > > > > compression dictionaries code and how we disperse notifications >>> > > > > around the cluster when a dict is trained etc. and it spit out >>> > > > > stuff like this. There is an IDEA plugin for TLA+ I ran it in and >>> > > > > it just worked and verified :) I can imagine these specs might be >>> > > > > theoretically something we commit into the repo as well when >>> > > > > applicable. That way we would at least conceptually codify the >>> > > > > protocols and could elaborate on them on a high level and run some >>> > > > > formal verifications etc ... Really appreciate this aspect of it. >>> > > > > >>> > > > > (1) >>> > > > > https://gist.github.com/smiklosovic/24b4db51f9ee2b64d76cb0bbb104e29a >>> > > > > >>> > > > > On Mon, May 11, 2026 at 11:31 AM C. Scott Andreas >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > Alex - thanks so much for putting this together and sharing. >>> > > > > >>> > > > > Here are three additional data loss / corruption bugs identified by >>> > > > > Arjun Ashok using this set of skills last week: >>> > > > > >>> > > > > – https://issues.apache.org/jira/browse/CASSANDRA-21356: >>> > > > > CursorBasedCompaction: ReusableLivenessInfo.isExpiring >>> > > > > <http://reusablelivenessinfo.isexpiring/>() incorrectly returns >>> > > > > true for tombstone cells, corrupting cursor-compacted SSTable >>> > > > > format and cell reconciliation >>> > > > > – https://issues.apache.org/jira/browse/CASSANDRA-21357: >>> > > > > CursorBasedCompaction: prevUnfilteredSize always written as 0 in >>> > > > > SSTableCursorWriter >>> > > > > – https://issues.apache.org/jira/browse/CASSANDRA-21358: >>> > > > > CursorBasedCompaction: Final index block width off by one byte in >>> > > > > SSTableCursorWriter#appendBIGIndex() >>> > > > > >>> > > > > Stepping back a bit -- >>> > > > > >>> > > > > This set of skills combined with the Opus model have enabled folks >>> > > > > to find 14 data loss, corruption, and correctness bugs in the >>> > > > > project in the past ~two weeks. These are bugs that likely would >>> > > > > have gone undetected - and if encountered in the wild, would have >>> > > > > required extensive manual fuzz testing to reproduce and identify. >>> > > > > >>> > > > > In the case of the the issue that I'd found and reported: >>> > > > > https://issues.apache.org/jira/browse/CASSANDRA-21340: GROUP BY >>> > > > > queries silently return incomplete results due to premature SRP >>> > > > > abort >>> > > > > >>> > > > > I found this by invoking the skill with the prompt "Review >>> > > > > Cassandra's implementation of GROUP BY for correctness. Identify >>> > > > > edge cases that might result in incorrect responses. After >>> > > > > identifying candidate bugs, fan out subagents to write unit tests >>> > > > > and fuzz tests attempting to reproduce them. Assess their veracity, >>> > > > > and present them in order of concern." >>> > > > > >>> > > > > In less than 30 minutes while sitting on the sofa, the model and >>> > > > > skill identified CASSANDRA-21340. In another hour, I was able to >>> > > > > establish its veracity, then leave the model and prompt behind to >>> > > > > work through the issue and write up the Jira ticket by hand. >>> > > > > >>> > > > > I'm *really* impressed by what this set of skills enable, and I >>> > > > > think they may be transformative for quality in Apache Cassandra – >>> > > > > especially when combined with the ability to write in-JVM dtests; >>> > > > > Harry tests; and to use the Simulator. These also make it a lot >>> > > > > easier to use each of these tools. >>> > > > > >>> > > > > Here's how I'm thinking about this work so far: >>> > > > > >>> > > > > – The ensemble review skills are a great first-pass review that can >>> > > > > be used by anyone preparing a patch to identify potential issues. >>> > > > > – They're incredible for pointing at existing and/or new + >>> > > > > experimental components in Cassandra to find serious correctness >>> > > > > issues. >>> > > > > – I'm sure we'd find latent issues if we directed the skills at >>> > > > > interaction between multiple components, like "range tombstones x >>> > > > > short read protection x reverse reads x compact storage" (etc). >>> > > > > – I think these skills could be generalized to support bug-finding >>> > > > > and validation in other Apache projects. >>> > > > > – I also think there is a generalization of these skills that could >>> > > > > be applied to CPU + allocation profiling and optimization. >>> > > > > >>> > > > > For those who have access to a suitable model, I'd love to hear >>> > > > > your experience attempting to find a latent bug in the database. >>> > > > > >>> > > > > I was shocked how easy it was, and am hopeful for what this might >>> > > > > do for quality and data integrity in the project. >>> > > > > >>> > > > > – Scott >>> > > > > >>> > > > > On May 8, 2026, at 5:22 PM, Alex Petrov <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > I would recommend Opus 4.6+ for /deep-review, but /shallow-review >>> > > > > is probably fine with sonnet. >>> > > > > >>> > > > > Maybe time permitting, I can do evals for different models at some >>> > > > > point. >>> > > > > >>> > > > > Review process is always a bottleneck and introducing such skills >>> > > > > should help to make it faster and more reliable. >>> > > > > >>> > > > > This is hope here, but this is also just a start: we need to reduce >>> > > > > false-positives, and do more with specifications (P, TLA+) for >>> > > > > critical parts of code. >>> > > > > >>> > > > > On Fri, May 8, 2026, at 5:56 PM, Dmitry Konstantinov wrote: >>> > > > > >>> > > > > Hi, Alex, thank you a lot for sharing it. I have been using Claude >>> > > > > code for review of my changes but in a very basic ad-hoc way, it >>> > > > > works for simple issues. The skills look much much more powerful. I >>> > > > > am going to read and try them in the upcoming weeks. >>> > > > > Review process is always a bottleneck and introducing such skills >>> > > > > should help to make it faster and more reliable. >>> > > > > >>> > > > > A question: what model(s) do you use to run them? Is Sonet 4.6 >>> > > > > enough? >>> > > > > >>> > > > > Thanks, >>> > > > > Dmitry >>> > > > > >>> > > > > On Fri, 8 May 2026 at 14:03, Alex Petrov <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > Hello folks, >>> > > > > >>> > > > > We have been working on some tooling [1] around Apache Cassandra >>> > > > > correctness, and wanted to share it with Cassandra community. >>> > > > > >>> > > > > We have approached this by "indexing" ~3k Cassandra issues and >>> > > > > extracting common patterns from them, generalizing them, then >>> > > > > running evals, tweaking, and extending them until we were had a >>> > > > > strong signal that it performs better than the run-of-the mill code >>> > > > > review skill. We have benchmarked it against some popular OSS >>> > > > > skills (by presenting bugs we knew existed from "indexing" Apache >>> > > > > Kafka, inferring commit bug source from the fix, and making sure >>> > > > > benchmarked skills actually find it). >>> > > > > >>> > > > > In addition, I did my best to codify some things I knew about >>> > > > > correctness, researching code, and writing repros, and what I could >>> > > > > find in research papers and public blog posts. >>> > > > > >>> > > > > So far we were able to find (at very least) following issues (in >>> > > > > reality the number is higher but I have a backlog of potential >>> > > > > leads to investigate and reproduce longer than the time I have >>> > > > > available for these pursuits). >>> > > > > >>> > > > > deep review + fuzzer: >>> > > > > >>> > > > > CASSANDRA-21307: Lower bound [SSTABLE_UPPER_BOUND(row000063)] is >>> > > > > bigger than first returned value >>> > > > > CASSANDRA-21292: Row re-inserted at the exact start of a range >>> > > > > tombstone disappears after major compaction >>> > > > > CASSANDRA-21255: Differentiate between legitimate cases where the >>> > > > > first entry is the same as the last entry and empty bounds in >>> > > > > SSTableCursorWriter#addIndexBlock() >>> > > > > >>> > > > > shallow + deep review: >>> > > > > >>> > > > > (latent) issue of unused keepFrom in linearSubtract >>> > > > > https://github.com/apache/cassandra-accord/pull/272 >>> > > > > CASSANDRA-21336: CursorBasedCompaction: trailing present columns >>> > > > > are silently dropped in encodeLargeColumnsSubset() >>> > > > > CASSANDRA-21340: GROUP BY queries silently return incomplete >>> > > > > results due to premature SRP abort >>> > > > > CASSANDRA-21352 TCM: AtomicLongBackedProcessor sort inversion >>> > > > > CASSANDRA-21353 putShortVolatile is not volatile in InMemoryTrie >>> > > > > >>> > > > > Via specifications: >>> > > > > >>> > > > > CASSANDRA-21337: Difference in behavior between Cursor-Based >>> > > > > compaction and "Regular" compaction >>> > > > > CASSANDRA-21336: CursorBasedCompaction: trailing present columns >>> > > > > are silently dropped in encodeLargeColumnsSubset() >>> > > > > CASSANDRA-21339: CursorBasedCompaction: expiring cells, same >>> > > > > timestamp, same ldt, different ttl >>> > > > > CASSANDRA-21338: value comparison direction reversed in >>> > > > > CursorCompactor >>> > > > > >>> > > > > A few folks were using this skill to test some of subsystems, and >>> > > > > might report more issues that I am not directly attributing here. I >>> > > > > have also used these skills for self-review and have caught a >>> > > > > couple of issues before they made it into the codebase. >>> > > > > >>> > > > > Despite some early success, I still consider this a very raw set of >>> > > > > prompts, but I think this has utility, and based on the success we >>> > > > > have seen so far, can be helpful and is (according to my >>> > > > > measurement methodology) fairing better than one-shot code review >>> > > > > prompts that an LLM would generate by user request. >>> > > > > >>> > > > > Since I was focusing on finding issues, running evals, and trying >>> > > > > several other methodologies that did not make into this >>> > > > > version/cut, I did not have a chance to sit and re-read the entire >>> > > > > final result just yet, which is why I am not suggesting merging >>> > > > > this into Cassandra codebase until we better vet it, but with your >>> > > > > help and feedback maybe we can do this quicker. >>> > > > > >>> > > > > Hope you find this useful, please share your opinion, experience, >>> > > > > and criticism. >>> > > > > >>> > > > > Happy bug hunting! >>> > > > > --Alex >>> > > > > >>> > > > > [1] https://github.com/apache/cassandra/pull/4794 >>> > > > > >>> > > > > >>> > > > > On Mon, Apr 13, 2026, at 1:12 PM, Štefan Miklošovič wrote: >>> > > > > >>> > > > > I noticed this PR just landed. >>> > > > > >>> > > > > Volunteers reviewing / improving greatly appreciated! >>> > > > > >>> > > > > (1) https://github.com/apache/cassandra/pull/4734 >>> > > > > >>> > > > > On Thu, Feb 26, 2026 at 5:43 PM Jon Haddad >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > I wanted to share a couple of other things I thought of. I wrote >>> > > > > this: >>> > > > > >>> > > > > > C*'s technical debt will make using an agent in the codebase much >>> > > > > > harder than using one in my own >>> > > > > >>> > > > > I want to clarify my intent with this statement. I was trying to >>> > > > > convey that I've had the luxury of refactoring my code several >>> > > > > times, because I don't have to worry about messing with other >>> > > > > people's branches. I usually write something, use it briefly, find >>> > > > > its faults, redo it, and iterate several times. I never consider >>> > > > > anything done and am always looking to improve. This is very >>> > > > > difficult with a project involving many people who have in-flight >>> > > > > branches spanning several months. Changes I consider no-brainers >>> > > > > might be a headache for C*. For example, I can just add a code >>> > > > > formatter and rewrite every file in the codebase. I make major >>> > > > > changes regularly without any consequences. Here, it impacts dozens >>> > > > > of people. I proactively improve my code's architecture because >>> > > > > there are few, if any, negative reasons not to. It's enabled me to >>> > > > > pay off a ton of technical debt that accumulated over the eight >>> > > > > years I handwrote everything. >>> > > > > >>> > > > > Another example: I've been working on an orchestration tool around >>> > > > > easy-db-lab to automate running my tests across several clusters in >>> > > > > parallel. I recently refactored it to split the REST server code >>> > > > > from the execution into Gradle submodules. Now I can create >>> > > > > different agents specializing in each module's content, which slims >>> > > > > down the context for each agent. Since I have a very clear >>> > > > > boundary on each agent's responsibility, I avoid the overhead of >>> > > > > having one agent manage one huge codebase. I can specifically tell >>> > > > > that one agent is responsible for this directory, and its expertise >>> > > > > is in Ktor. Another agent is a Gradle expert. Another is >>> > > > > Kubernetes. When I work on tasks they can be decomposed into task >>> > > > > lists for each specialized agent. >>> > > > > >>> > > > > I've always thought this would be a great architectural improvement >>> > > > > for the C* codebase regardless of LLMs. For example, putting the >>> > > > > CQL parser in a standalone module would allow us to publish it so >>> > > > > people could consume it in their own ecosystem without pulling in >>> > > > > C*-all. Isolating a few of these subsystems could reduce cognitive >>> > > > > overhead and simplify test design. I'm sure making the commit log >>> > > > > reader standalone would make it much easier to use in the sidecar. >>> > > > > Easily using the SSTable readers and writers without all the other >>> > > > > dependencies would reduce workarounds in bulk analytics and make >>> > > > > these types of projects more feasible, benefiting the wider >>> > > > > ecosystem. >>> > > > > >>> > > > > Regardless of this approach, creating a devcontainer environment >>> > > > > for the project and pushing the image to GHCR would also be >>> > > > > beneficial. I am now using one with each of my tools. I don't >>> > > > > trust Claude not to wipe my system, so I sandbox it in a container. >>> > > > > It only has access to the local project and cannot push code or >>> > > > > reach GitHub. Devcontainers are supported directly in IDEA, Zed, >>> > > > > and VSCode. You can also launch them directly from GitHub or use >>> > > > > the Claude mobile app. I haven't spent much time on this yet >>> > > > > though, I still prefer two big 5k screens and a deafening >>> > > > > mechanical keyboard. >>> > > > > >>> > > > > Jon >>> > > > > >>> > > > > [1] >>> > > > > https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/devcontainer.json >>> > > > > [2] >>> > > > > https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/Dockerfile >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Thu, Feb 26, 2026 at 12:58 AM Štefan Miklošovič >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > Thank you Jon for sharing,that was very helpful. All these insights >>> > > > > are invaluable. >>> > > > > >>> > > > > On Wed, Feb 25, 2026 at 11:50 PM Jon Haddad >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > Regarding ant, we'd probably want a wrapper shell script that is >>> > > > > more LLM-friendly, hiding the excessive text and providing more >>> > > > > actionable output. You can also delegate any task to a subagent so >>> > > > > you don't waste your context on the `ant` output, and use Claude's >>> > > > > new Agent Teams [1] feature to have a "builder" agent run in its >>> > > > > own process. >>> > > > > Docs help Claude find code, big time. You can give it your >>> > > > > organizational structure and that institutional knowledge so it >>> > > > > doesn't have to pull in many tokens from dozens of files. It >>> > > > > *definitely* works. I've pushed over a quarter million LOC this >>> > > > > month alone [1], and many of you may already know I'm obsessed with >>> > > > > efficiency. I constantly test new ideas and approaches to refine >>> > > > > my process; I've found good documentation is *critical*. >>> > > > > >>> > > > > I've recently started working with both Spec-Kit (Microsoft, but it >>> > > > > looks abandoned) and OpenSpec, as both are designed to maintain >>> > > > > long-term memory for a project's product requirements and technical >>> > > > > decisions. OpenSpec is supposed to work better for brownfield and >>> > > > > iterative projects. I haven't tried BMAD yet. It seemed a bit >>> > > > > more heavyweight, but it may be better for this project than my >>> > > > > personal ones, where I don't collaborate with anyone. >>> > > > > >>> > > > > I have found that the best results come from loosely coupled >>> > > > > systems. C*'s technical debt will make using an agent in the >>> > > > > codebase much harder than using one in my own. I haven't tried to >>> > > > > work on a patch in C* yet with an agent, but when I do I'll be sure >>> > > > > to share what I've learned. >>> > > > > >>> > > > > Today I introduced OpenSpec to easy-db-lab, you can see what it >>> > > > > looks like [3] if you're curious. A number of markdown commands >>> > > > > were added to the repo, and Spec-Kit was removed. I haven't >>> > > > > reviewed it yet. By the time you read this I will have likely made >>> > > > > some changes in a review. If you want to see the before and after, >>> > > > > the pre-review commit is c6a94e1. >>> > > > > >>> > > > > Jon >>> > > > > >>> > > > > [1] https://code.claude.com/docs/en/agent-teams >>> > > > > [2] my 2 main projects, not including client work: >>> > > > > git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | >>> > > > > awk 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, >>> > > > > "Removed:", removed}' >>> > > > > Added: 90339 Removed: 45222 >>> > > > > >>> > > > > git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | >>> > > > > awk 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, >>> > > > > "Removed:", removed}' >>> > > > > Added: 124863 Removed: 52923 >>> > > > > >>> > > > > >>> > > > > [3] https://github.com/rustyrazorblade/easy-db-lab/pull/530/changes >>> > > > > >>> > > > > On Wed, Feb 25, 2026 at 6:18 AM David Capwell <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > I’m not against memory / skills being added, but do want to request >>> > > > > we think / test to make sure we can quantify the gains >>> > > > > >>> > > > > <arxiv-logo-fb.png> >>> > > > > Evaluating AGENTS.md <http://agents.md/>: Are Repository-Level >>> > > > > Context Files Helpful for Coding Agents? >>> > > > > arxiv.org <http://arxiv.org/> >>> > > > > >>> > > > > <arxiv-logo-fb.png> >>> > > > > SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse >>> > > > > Tasks >>> > > > > arxiv.org <http://arxiv.org/> >>> > > > > >>> > > > > >>> > > > > These papers actually match my lived experience with this projects >>> > > > > and others. >>> > > > > >>> > > > > 1) using /init to create CLAUDE.md <http://claude.md/> / AGENTS.md >>> > > > > <http://agents.md/> yields negative results. This is how I started >>> > > > > and have moved away. What is the context you need 100% of the >>> > > > > thing? It’s things that Claude can’t discover easy such as tribal >>> > > > > knowledge (such as link to our style guide). >>> > > > > 2) Ant is horrible for agents, not to figure out what to do (Claude >>> > > > > is good at that) but at context bloat… do “ant jar” and you add >>> > > > > like 10-20k tokens… you MUST have tooling to fix this (I ban Claude >>> > > > > from touching ant command, it’s only allowed to run “ai-build”, and >>> > > > > “ai-ci-test” as these fix the context problems; rtk “might” work >>> > > > > here, not tested as in on leave) >>> > > > > 3) Claude doesn’t need docs to find code, that actually confuses it >>> > > > > more. When it needs to modify code it’s going to have to explore >>> > > > > and will most likely find what it needs. I agree docs for humans >>> > > > > would help, but let’s keep it out of AI memory files. >>> > > > > 4) I only really use sonnet/opus 4.5+, these claims might not be >>> > > > > true for older models or the open weight models. >>> > > > > >>> > > > > As for skills, the following makes sense to me but I really hope a >>> > > > > human writes as AI doesn’t do well at understanding the WHY well >>> > > > > and makes bad assumptions: property testing, stateful property >>> > > > > testing, harry, The Simulator. I left out cqltester because I >>> > > > > found Claude doesn’t suck at it, so not sure what a skill would >>> > > > > add. The others I found it struggles with and produces bad quality >>> > > > > tests. >>> > > > > >>> > > > > Last comment: Stefan, your link about ai code in the project didn’t >>> > > > > take into account what happened in the PR. Our global static state >>> > > > > world caused a single test to fail which required a complete >>> > > > > rewrite of the patch that I ended up doing by hand. So that patch >>> > > > > ended up being 100% human. >>> > > > > >>> > > > > Sent from my iPhone >>> > > > > >>> > > > > On Feb 18, 2026, at 6:29 PM, Štefan Miklošovič >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > These are great points. I like how granular the approach of having >>> > > > > multiple files is. That means we do not need to craft one >>> > > > > "uber-claude.md" but we can do this iteratively and per specific >>> > > > > domain which is easier to handle. >>> > > > > >>> > > > > One consequence of having these "context files" is that a >>> > > > > contributor >>> > > > > does not even need to use any AI whatsoever in order to be more >>> > > > > productive and organized. There is a lot of time lost when a new >>> > > > > contributor wants to understand how the project "thinks", what are >>> > > > > do-s and dont-s etc. All stuff which appears once a patch is >>> > > > > submitted. If we explained to everybody in plain English how this >>> > > > > all >>> > > > > works on a detailed level, per domain, that would be tremendously >>> > > > > helpful even without AI. >>> > > > > >>> > > > > It will be interesting to watch how these files are written. To >>> > > > > formalize and write it down is quite a task on its own. >>> > > > > >>> > > > > >>> > > > > On Wed, Feb 18, 2026 at 6:47 PM Patrick McFadin <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > Context size is the hardest thing to manage right now in agentic >>> > > > > coding. I’ve stopped using MCP and switched to skills as a result. >>> > > > > >>> > > > > >>> > > > > A couple of things worth noting. You can use many multiple >>> > > > > CLAUDE.md/AGENT.md <http://claude.md/AGENT.md> files in a large >>> > > > > code base. I’m started doing this and it is remarkable. For >>> > > > > example, in the pylib directory a CLAUDE.md <http://claude.md/> >>> > > > > file would provide the Python specific info if making changes. The >>> > > > > standard layout for each should be >>> > > > > >>> > > > > - What is this >>> > > > > >>> > > > > - Where do I get more information >>> > > > > >>> > > > > - How do I run or test >>> > > > > >>> > > > > - What are the non-nogetialble rules >>> > > > > >>> > > > > - What does done look like >>> > > > > >>> > > > > >>> > > > > Imagine one in all sorts of places. fqtool, sstableloader, o.a.c.io >>> > > > > <http://o.a.c.io/>.*, o.a.c.repair <http://o.a.c.repair/>.* etc >>> > > > > etc. And they can evolve over time as people use them. >>> > > > > >>> > > > > >>> > > > > The other thing to bring up is Brokk built by Jonathan Ellis. He >>> > > > > specifically built it for large code bases and specifically tests >>> > > > > on the Cassandra code base. (I’ll let him jump in here) >>> > > > > >>> > > > > >>> > > > > Patrick >>> > > > > >>> > > > > >>> > > > > On Feb 18, 2026, at 8:51 AM, Josh McKenzie <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > I’ve had trouble using Claude effectively on C*’s large codebase >>> > > > > without a lot of repeated “repo discovery” prompting. >>> > > > > >>> > > > > >>> > > > > Just to keep beating the drum: I've had trouble working in our >>> > > > > codebase effectively without a lot of repeated "repo discovery" >>> > > > > time. In fact, a huge portion of the time I spend working on the >>> > > > > codebase consists of reading into adjacent coupled classes and >>> > > > > modules since things are a) not consistently or thoroughly >>> > > > > documented, and b) generally not that decoupled. >>> > > > > >>> > > > > >>> > > > > This is also / primarily a "human <-> information interfacing >>> > > > > efficiency problem" and it just so happens LLM's and agents being >>> > > > > blocked from working on our codebase is giving us an immediate >>> > > > > short-term pain-proxy for something I strongly believe has been a >>> > > > > long-term tax on us. >>> > > > > >>> > > > > >>> > > > > On Wed, Feb 18, 2026, at 10:04 AM, Isaac Reath wrote: >>> > > > > >>> > > > > >>> > > > > I'm a +1 for the same reason that Josh lays out. Markdown files >>> > > > > that detail the structure of the repo, how to build & run tests, >>> > > > > how to get checkstyle to pass, etc. are all very valuable to new >>> > > > > contributors even if LLMs went away today. >>> > > > > >>> > > > > >>> > > > > On Tue, Feb 17, 2026 at 7:33 PM Jon Haddad >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > It's all part of the same topic, Yifan. You're making a >>> > > > > distinction without a difference. We could just as easily be >>> > > > > discussing supporting certain MCP servers like serena, or baking >>> > > > > claude into a devcontainer. It's all relevant. There's no need to >>> > > > > police the discussion. >>> > > > > >>> > > > > >>> > > > > On Tue, Feb 17, 2026 at 4:25 PM Yifan Cai <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > The original post was about adding AI tooling, prompt, command, or >>> > > > > skill. The thread is shifted to AI memory files. >>> > > > > >>> > > > > >>> > > > > I do not have an objection to any of these, but want to make sure >>> > > > > that we are still on the original topic. >>> > > > > >>> > > > > >>> > > > > IMO, AI tooling has a clear scope / definition and is easier to >>> > > > > reach consensus on. Meanwhile, AI memory files are vague to define >>> > > > > clearly. Different developers on different domains could have quite >>> > > > > different preferences. >>> > > > > >>> > > > > >>> > > > > - Yifan >>> > > > > >>> > > > > >>> > > > > On Tue, Feb 17, 2026 at 3:37 PM Dmitry Konstantinov >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > I do not have my one but here there are few examples from oher >>> > > > > Apache projects: >>> > > > > >>> > > > > https://github.com/apache/camel/blob/main/AGENTS.md >>> > > > > >>> > > > > https://github.com/apache/ignite-3/blob/main/CLAUDE.md >>> > > > > >>> > > > > https://github.com/apache/superset/blob/master/superset/mcp_service/CLAUDE.md >>> > > > > >>> > > > > >>> > > > > On Tue, 17 Feb 2026 at 23:22, Jon Haddad <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > I think a few folks are already using CLAUDE.md <http://claude.md/> >>> > > > > files in their repo and they're just not committing them. >>> > > > > >>> > > > > Anyone want to share what's already done? I'm happy to help share >>> > > > > what I know about the agentic side of things, but since I don't do >>> > > > > much in the way of patching C* it would be a lot of guessing. >>> > > > > >>> > > > > >>> > > > > If I'm wrong and nobody shares one, I'll take a stab at it. >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Tue, Feb 17, 2026 at 3:08 PM Štefan Miklošovič >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > Great feedback everybody! Really appreciate it! >>> > > > > >>> > > > > >>> > > > > Reading what Jon posted ... Jon, I think you are the most >>> > > > > experienced >>> > > > > >>> > > > > in this based on what you wrote. Would you mind doing some POC here >>> > > > > >>> > > > > for Cassandra repo? For the trunk it is enough ... Something we >>> > > > > might >>> > > > > >>> > > > > build further on. I think we need to build the foundations of that >>> > > > > and >>> > > > > >>> > > > > put some structure into it and all things considered I think you are >>> > > > > >>> > > > > best for the job here. >>> > > > > >>> > > > > >>> > > > > If the basics are there we can play with it more before merging, >>> > > > > this >>> > > > > >>> > > > > is not something which needs to be done "tomorrow", we can >>> > > > > collaborate >>> > > > > >>> > > > > on something together for some time and add things into it as >>> > > > > patches >>> > > > > >>> > > > > come. I think it takes some time to "tune" it. >>> > > > > >>> > > > > >>> > > > > Everybody else feel free to help! My experience in this space is >>> > > > > >>> > > > > limited, I think there are people who are using it more often than >>> > > > > me >>> > > > > >>> > > > > for sure. >>> > > > > >>> > > > > >>> > > > > Regards >>> > > > > >>> > > > > >>> > > > > On Wed, Feb 18, 2026 at 12:59 AM Joel Shepherd <[email protected] >>> > > > > <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > There's been some momentum building for AGENTS.md >>> > > > > <http://agents.md/> files, both on the >>> > > > > >>> > > > > project and on the agent side: >>> > > > > >>> > > > > >>> > > > > https://agents.md <https://agents.md/>/ >>> > > > > >>> > > > > >>> > > > > Same idea and benefits, but it might help to align folks on a >>> > > > > "standard" >>> > > > > >>> > > > > that will work well across agents. >>> > > > > >>> > > > > >>> > > > > I also think that more and better code documentation can be very >>> > > > > >>> > > > > beneficial when using agents to help with working out implementation >>> > > > > >>> > > > > details. I spent a bunch of time in January writing an introduction >>> > > > > to >>> > > > > >>> > > > > Apache Ratis (Raft as a library: >>> > > > > >>> > > > > https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/index.md). >>> > > > > >>> > > > > The code itself is pretty well-documented but it was hard for me to >>> > > > > >>> > > > > build a mental model of how to integrate with. AI was very >>> > > > > effective in >>> > > > > >>> > > > > taking the granular in-code documentation and synthesizing an >>> > > > > overview >>> > > > > >>> > > > > from it. Going the other way, the in-code documentation has made it >>> > > > > >>> > > > > possible for me to deep dive the Ratis code to root cause bugs, etc. >>> > > > > >>> > > > > Agents can get a lot out of good class- and method-level >>> > > > > documentation. >>> > > > > >>> > > > > >>> > > > > -- Joel. >>> > > > > >>> > > > > >>> > > > > On 2/16/2026 8:03 PM, Bernardo Botella wrote: >>> > > > > >>> > > > > CAUTION: This email originated from outside of the organization. Do >>> > > > > not click links or open attachments unless you can confirm the >>> > > > > sender and know the content is safe. >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > Thanks for bringing this up Stefan!! >>> > > > > >>> > > > > >>> > > > > A really interesting topic indeed. >>> > > > > >>> > > > > >>> > > > > >>> > > > > I’ve also heard ideas around even having Claude.md >>> > > > > <http://claude.md/> type of files that help LLMs understand the >>> > > > > code base without having to do a full scan every time. >>> > > > > >>> > > > > >>> > > > > So, all and all, putting together something that we as a community >>> > > > > think that describe good practices + repository information not >>> > > > > only for the main Cassandra repository, but also for its >>> > > > > subprojects, will definitely help contributors adhere to standards >>> > > > > and us reviewers to ensure that some steps at least will have been >>> > > > > considered. >>> > > > > >>> > > > > >>> > > > > Things like: >>> > > > > >>> > > > > - Repository structure. What every folder is >>> > > > > >>> > > > > - Tests suits and how they work and run >>> > > > > >>> > > > > - Git commits standards >>> > > > > >>> > > > > - Specific project lint rules (like braces in new lines!) >>> > > > > >>> > > > > - Preferred wording style for patches/documentation >>> > > > > >>> > > > > >>> > > > > Committed to the projects, and accesible to LLMs, sound like really >>> > > > > useful context for those type of contributions (that are going to >>> > > > > keep happening regardless). >>> > > > > >>> > > > > >>> > > > > So curious to read what others think. >>> > > > > >>> > > > > Bernardo >>> > > > > >>> > > > > >>> > > > > PD. Totally agree that this should change nothing of the quality >>> > > > > bar for code reviews and merged code >>> > > > > >>> > > > > >>> > > > > On Feb 16, 2026, at 6:27 PM, Štefan Miklošovič >>> > > > > <[email protected] <mailto:[email protected]>> wrote: >>> > > > > >>> > > > > >>> > > > > Hey, >>> > > > > >>> > > > > >>> > > > > This happened recently in kernel space. (1), (2). >>> > > > > >>> > > > > >>> > > > > What that is doing, as I understand it, is that you can point LLM to >>> > > > > >>> > > > > these resources and then it would be more capable when reviewing >>> > > > > >>> > > > > patches or even writing them. It is kind of a guide / context >>> > > > > provided >>> > > > > >>> > > > > to AI prompt. >>> > > > > >>> > > > > >>> > > > > I can imagine we would just compile something similar, merge it to >>> > > > > the >>> > > > > >>> > > > > repo, then if somebody is prompting it then they would have an >>> > > > > easier >>> > > > > >>> > > > > job etc etc, less error prone ... adhered to code style etc ... >>> > > > > >>> > > > > >>> > > > > This might look like a controversial topic but I think we need to >>> > > > > >>> > > > > discuss this. The usage of AI is just more and more frequent. From >>> > > > > >>> > > > > Cassandra's perspective there is just this (3) but I do not think we >>> > > > > >>> > > > > reached any conclusions there (please correct me if I am wrong where >>> > > > > >>> > > > > we are at with AI generated patches). >>> > > > > >>> > > > > >>> > > > > This is becoming an elephant in the room, I am noticing that some >>> > > > > >>> > > > > patches for Cassandra were prompted by AI completely. I think it >>> > > > > would >>> > > > > >>> > > > > be way better if we make it easy for everybody contributing like >>> > > > > that. >>> > > > > >>> > > > > >>> > > > > This does not mean that we, as committers, would believe what AI >>> > > > > >>> > > > > generated blindlessly. Not at all. It would still need to go over >>> > > > > the >>> > > > > >>> > > > > formal review as anything else. But acting like this is not >>> > > > > happening >>> > > > > >>> > > > > and people are just not going to use AI when trying to contribute is >>> > > > > >>> > > > > not right. We should embrace it in some form ... >>> > > > > >>> > > > > >>> > > > > 1) https://github.com/masoncl/review-prompts >>> > > > > >>> > > > > 2) >>> > > > > https://lore.kernel.org/lkml/[email protected]/ >>> > > > > >>> > > > > 3) https://lists.apache.org/thread/j90jn83oz9gy88g08yzv3rgyy0vdqrv7 >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > >>> > > > > Dmitry Konstantinov >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > Dmitry Konstantinov >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > >>> >>
