andrewmusselman opened a new issue, #27:
URL: https://github.com/apache/tooling-agents/issues/27

   ## Summary
   
   The `services/database_service` layer now provides bulk operations 
(`set_many`, `delete_many`, `get_many`) backed by CouchDB's `_bulk_docs` 
endpoint. Two places in the ASVS orchestrator currently loop single-key 
operations and would benefit from bulk equivalents.
   
   ## Opportunity 1: `cleanStaleReports` orphan deletion
   
   **Current code** in `asvs_orchestrate.py` at the cleanup branch (~line 1130 
area):
   
   ```python
   for k in orphan_keys:
       try:
           reports_ns.delete(k)
           deleted += 1
       except Exception as de:
           print(f"    {k}: delete failed: {de}", flush=True)
   ```
   
   For a typical L3 run where discovery reshuffles pass names between 
executions, `orphan_keys` can contain 70–345 entries. Each iteration is one 
HTTP round trip to CouchDB. Replace with `delete_many` for one round trip total.
   
   **Fix:**
   
   ```python
   try:
       reports_ns.delete_many(orphan_keys)
       deleted = len(orphan_keys)
   except Exception as e:
       print(f"  bulk delete failed: {e}", flush=True)
       deleted = 0
   ```
   
   Surface per-key failures from the bulk response if the bulk API exposes 
them; otherwise fall back to the loop on bulk-call failure.
   
   ## Opportunity 2: Per-bundle report storage
   
   **Current code** in `run_bundle()`:
   
   ```python
   async def store_one(section_id, report_text):
       try:
           key = f"{pass_name}/{section_id}.md"
           reports_ns.set(key, report_text)
           return section_id, None
       except Exception as e:
           ...
   
   push_results = await asyncio.gather(*[
       store_one(sid, txt) for sid, txt in per_section_reports.items()
   ])
   ```
   
   A bundle typically holds 6 sections — six `set` round trips per bundle. 
Replace with a single `set_many` after the bundle's audit completes.
   
   **Fix:**
   
   ```python
   items = {f"{pass_name}/{sid}.md": txt for sid, txt in 
per_section_reports.items()}
   try:
       reports_ns.set_many(items)
       for sid in per_section_reports:
           local_successes.append(sid)
           print(f"    [{pass_name}] {sid}: stored", flush=True)
   except Exception as e:
       err_str = str(e) or f"{type(e).__name__} (no detail)"
       for sid in per_section_reports:
           local_failures.append(f"{sid} (store): {err_str}")
           print(f"    [{pass_name}] {sid}: store failed: {err_str}", 
flush=True)
   ```
   
   Smaller individual win than (1) but multiplied across hundreds of bundles in 
a large run.
   
   ## Opportunity 3 (optional): Consolidate Phase 1 reads
   
   `asvs_consolidate.py` Phase 1 reads every per-section report from the 
reports namespace. Currently this is one `get` per key. If `get_many` exists, 
batch the reads. Lower priority — consolidate is already fast and parallelized 
— but worth an audit pass.
   
   ## Validation
   
   1. Time a `cleanStaleReports=true` run with ~70 orphan keys before and after 
the change. Should drop from seconds to ~one HTTP round trip.
   2. Compare audit-phase wall time on a large run (200+ sections) before and 
after the storage refactor. Smaller win but should be measurable.
   3. Verify behavior on partial bulk failures — if `set_many` half-succeeds, 
the failure handling should match the per-key path's behavior (mark affected 
sections as failed, don't claim success).
   
   ## Related
   
   - CouchDB bulk-ops backend work landed in services/database_service 
(separate effort).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to