andrewmusselman opened a new issue, #26:
URL: https://github.com/apache/tooling-agents/issues/26

   ## Summary
   
   The audit phase's analysis cache key in `asvs_audit.py` is `f"batch-{i}"` 
where `i` is the batch index within a section. The cache namespace incorporates 
ASVS section ID and file namespaces (so different sections / file scopes don't 
collide), but within that scope the key is invariant under prompt changes. This 
means edits to the Opus system prompt — including severity rubric, 
instructions, examples — do not invalidate the cache. Cached Opus output 
written under the old prompt is replayed on subsequent runs with 
`clearCache=false`.
   
   ## Symptom
   
   After editing the audit system prompt (e.g., to fix a calibration issue), 
re-running the orchestrator with `clearCache=false` returns identical findings 
to the previous run because every section hits cache. The fix appears to have 
done nothing.
   
   The current workaround is `clearCache=true`, but this is a sledgehammer — it 
also wipes the source-code download from `code_namespace`, forcing a fresh 
tarball pull from GitHub for every iteration. Slow and wasteful when all you 
want is to re-audit with the new prompt.
   
   ## Proposed Fix
   
   Include a hash of the analysis system prompt in the cache key:
   
   ```python
   import hashlib
   
   # In asvs_audit.py, near the existing cache key construction (~line 959):
   prompt_fingerprint = 
hashlib.sha256(analysis_system_prompt.encode()).hexdigest()[:8]
   cache_key = f"batch-{i}-{prompt_fingerprint}"
   ```
   
   When the prompt changes, all cache keys change, all sections re-audit. 
Source download stays cached. `clearCache` reverts to its narrower meaning of 
"I want a fresh tarball."
   
   ## Decision: hash scope
   
   Two reasonable choices:
   
   1. **Full prompt hash** (simpler) — any wording change anywhere invalidates 
everything. Slight over-eager but eliminates ambiguity.
   2. **Rubric-only hash** (more targeted) — extract the static rubric portion 
and hash just that. Section-specific context and dynamic content don't trigger 
invalidation.
   
   Recommendation: start with full prompt hash. If over-invalidation gets 
annoying in practice, refactor to rubric-only later. The cost of an unnecessary 
cache miss is one extra Opus call per affected section — slow but bounded.
   
   ## Side Effect: Orphan Cache Keys
   
   Old cache keys with stale prompt fingerprints become orphans in the 
`audit-cache:analysis:*` namespace. Cheap in CouchDB but accumulates over time. 
Two options:
   
   1. Ignore — storage cost is negligible.
   2. Add a periodic sweep that deletes keys with old prompt fingerprints. 
Could piggyback on the existing `cleanStaleReports` flow.
   
   Defer the sweep until orphans actually cause problems.
   
   ## Validation
   
   1. Pick any single-line wording change to the audit system prompt.
   2. Re-run the orchestrator on a previously-audited repo with 
`clearCache=false`.
   3. Verify in logs that Opus calls happen for every section (cache misses 
everywhere).
   4. Verify the source download was reused (no re-download).
   5. Inspect CouchDB: `audit-cache:analysis:*` keys should have new 
fingerprints; old ones still present as orphans.
   
   ## Related
   
   - The severity calibration work that uncovered this — prompt edits silently 
no-op'd until `clearCache=true` was used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to