andrewmusselman opened a new issue, #27:
URL: https://github.com/apache/tooling-agents/issues/27
## Summary
The `services/database_service` layer now provides bulk operations
(`set_many`, `delete_many`, `get_many`) backed by CouchDB's `_bulk_docs`
endpoint. Two places in the ASVS orchestrator currently loop single-key
operations and would benefit from bulk equivalents.
## Opportunity 1: `cleanStaleReports` orphan deletion
**Current code** in `asvs_orchestrate.py` at the cleanup branch (~line 1130
area):
```python
for k in orphan_keys:
try:
reports_ns.delete(k)
deleted += 1
except Exception as de:
print(f" {k}: delete failed: {de}", flush=True)
```
For a typical L3 run where discovery reshuffles pass names between
executions, `orphan_keys` can contain 70–345 entries. Each iteration is one
HTTP round trip to CouchDB. Replace with `delete_many` for one round trip total.
**Fix:**
```python
try:
reports_ns.delete_many(orphan_keys)
deleted = len(orphan_keys)
except Exception as e:
print(f" bulk delete failed: {e}", flush=True)
deleted = 0
```
Surface per-key failures from the bulk response if the bulk API exposes
them; otherwise fall back to the loop on bulk-call failure.
## Opportunity 2: Per-bundle report storage
**Current code** in `run_bundle()`:
```python
async def store_one(section_id, report_text):
try:
key = f"{pass_name}/{section_id}.md"
reports_ns.set(key, report_text)
return section_id, None
except Exception as e:
...
push_results = await asyncio.gather(*[
store_one(sid, txt) for sid, txt in per_section_reports.items()
])
```
A bundle typically holds 6 sections — six `set` round trips per bundle.
Replace with a single `set_many` after the bundle's audit completes.
**Fix:**
```python
items = {f"{pass_name}/{sid}.md": txt for sid, txt in
per_section_reports.items()}
try:
reports_ns.set_many(items)
for sid in per_section_reports:
local_successes.append(sid)
print(f" [{pass_name}] {sid}: stored", flush=True)
except Exception as e:
err_str = str(e) or f"{type(e).__name__} (no detail)"
for sid in per_section_reports:
local_failures.append(f"{sid} (store): {err_str}")
print(f" [{pass_name}] {sid}: store failed: {err_str}",
flush=True)
```
Smaller individual win than (1) but multiplied across hundreds of bundles in
a large run.
## Opportunity 3 (optional): Consolidate Phase 1 reads
`asvs_consolidate.py` Phase 1 reads every per-section report from the
reports namespace. Currently this is one `get` per key. If `get_many` exists,
batch the reads. Lower priority — consolidate is already fast and parallelized
— but worth an audit pass.
## Validation
1. Time a `cleanStaleReports=true` run with ~70 orphan keys before and after
the change. Should drop from seconds to ~one HTTP round trip.
2. Compare audit-phase wall time on a large run (200+ sections) before and
after the storage refactor. Smaller win but should be measurable.
3. Verify behavior on partial bulk failures — if `set_many` half-succeeds,
the failure handling should match the per-key path's behavior (mark affected
sections as failed, don't claim success).
## Related
- CouchDB bulk-ops backend work landed in services/database_service
(separate effort).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]