[I] Context compaction agent — cheap-LLM summarize, expensive-LLM consume (tooling-agents)

via GitHub Fri, 22 May 2026 09:59:52 -0700


andrewmusselman opened a new issue, #44:
URL: https://github.com/apache/tooling-agents/issues/44


   Carry-forward from #14 ("Refactor prompts"). That thread split into three
   readings of "telescoping" and converged on the right one being context
   compaction with a hierarchical reference tree. The ASVS-specific budget work
   landed in `48a15e8` (see [`ASVS/audit-budget.md`][budget] and
   [`ASVS/optimization-plan.md`][plan]), but the *general* compaction primitive
   sbp asked for never shipped. Filing this issue to revisit it now.
   
   [budget]: 
https://github.com/apache/tooling-agents/blob/main/ASVS/audit-budget.md
   [plan]: 
https://github.com/apache/tooling-agents/blob/main/ASVS/optimization-plan.md
   
   ## Problem
   
   The expensive part of any LLM-heavy pipeline is feeding the expensive model.
   Today, an audit pass for a single ASVS section might hand Opus 200–300KB of
   raw source code, most of which is irrelevant to the requirement being
   audited. The same pattern shows up anywhere that a high-context, high-cost
   reasoning call follows a data-gathering step: the gathering step assembled
   the world; the reasoning step pays for the world even though it only needs
   a fraction of it.
   
   A cheaper model, asked to *first* summarize the gathered context with
   respect to the question being asked, produces a fraction of the tokens at a
   tiny fraction of the cost. The expensive model then reasons over the
   summary, optionally drilling into specific parts of the raw via tool calls.
   
   ## What sbp described in #14
   
   > "Use cheaper LLMs to gather the context, medium cost LLMs to collate and
   > do a small amount of reasoning about that gathered context (also
   > compactifying), and then high cost LLMs to work on the output of that."
   
   > "The idea was to telescope the compressed outputs, so you have a kind of
   > compressed tree of references."
   
   So the shape isn't just summarization — it's a *hierarchical* index the
   expensive model can navigate:
   
   - **Level 0:** raw source (full files, full data).
   - **Level 1:** per-unit abstracts (function signatures + key comments for
     code; per-file summaries for docs; per-row digests for data).
   - **Level 2:** group abstracts (per-directory or per-domain summaries).
   - **Level N:** the entry-point index Opus actually receives.
   
   The expensive model reads the top of the tree, picks which subtree it
   needs, and a tool call fetches the next level down. Most calls never need
   to descend to Level 0.
   
   Lossless compaction (claw-compactor style, ~25% reduction) doesn't get
   anywhere near the cost reduction needed. The win is lossy: discarding
   everything not relevant to the question at hand, with the raw still
   available behind a tool call if the expensive model asks for it.
   
   ## Why this matters now
   
   - **ASVS specifically.** The pipeline today sends ~150–300KB of source per
     Opus call. Half of any file is irrelevant to whichever ASVS section is
     being audited. A 70–80% reduction is realistic; that's a several-X cost
     reduction on the hottest path.
   - **Demo budget control.** A user wants to demo an agent on a real
     codebase but doesn't want to pay full Opus rates for every iteration.
     Compaction-with-cache makes the first run expensive and every subsequent
     iteration cheap.
   - **Quality-curve experiments.** Comparing "Opus on raw" vs "Opus on
     compacted" is the kind of experiment that's hard to run today because
     the compaction step doesn't exist as a reusable thing.
   
   ## Recommendation: build it as an agent first
   
   The compaction *strategy* is domain-specific: code wants signature
   extraction, audit reports want finding-extraction, narrative wants
   narrative summarization. Different users will want different strategies.
   Different agents in the same pipeline will want different strategies.
   Putting strategy in user agent code keeps the platform simple and the
   iteration fast — same argument the original #14 thread reached for
   telescoping itself.
   
   What the platform should provide is the *substrate*: caching with
   content-addressed keys, so the same source → strategy → compact mapping is
   computed once and reused. That substrate already exists, just not formally
   — `data_store` with a disciplined naming convention does it today. A
   platform-level promotion is small enough that it can wait until the agent
   pattern has shaken out.
   
   ## MVP — entirely as an agent
   
   A `compactor` agent with this contract:
   
   ```python
   async def run(input_dict, tools):
       source_namespace = input_dict["source_namespace"]
       keys = input_dict.get("keys") or "*"           # specific keys or all
       strategy = input_dict["strategy"]              # named strategy
       level = input_dict.get("level", 1)             # which level to produce
       purpose_hint = input_dict.get("purpose", "")   # what's being 
audited/asked
   ```
   
   Returns:
   
   ```python
   {
     "compact_namespace": 
"compact:files:apache/airflow:v1:security_skeleton:abc12345",
     "level": 1,
     "key_map": {                                   # original key → compact key
       "airflow/dags/example.py": "summary_of_example_py",
       # ...
     },
     "stats": {"input_tokens": 89432, "output_tokens": 18211, "saved_pct": 79.6}
   }
   ```
   
   The agent computes the content+strategy hash, checks `data_store` for
   existing compacts under `compact:{source}:{version}:{strategy}:{hash}`,
   returns immediately if cached. Otherwise calls the cheap LLM with the
   strategy's prompt template against each input chunk, stores the results,
   and returns the namespace pointer.
   
   Other agents in the pipeline call this once at the start of a pass, get
   back a namespace pointer, and feed *that namespace* to Opus instead of the
   raw. No platform change needed.
   
   ## Naming convention for cache keys
   
   ```
   compact:{source_namespace}:{strategy_version}:{strategy_name}:{content_hash}
                                                                 └─ first 8 
chars
                                                                    of sha256 of
                                                                    input 
content
                                                                    + strategy
                                                                    + level
   ```
   
   Hash includes the source content, strategy version, strategy name, and
   level so that any of those changing produces a fresh cache key without
   invalidating siblings. Same pattern the relevance-filter cache uses today
   with its policy hash.
   
   ## Strategy registry — also as an agent, not platform
   
   Each strategy is a named (prompt template, target model, target token
   budget) tuple. Ship a few canned ones:
   
   | Strategy name | What it does | Cheap LLM | Notes |
   |---|---|---|---|
   | `code_signature` | Functions + docstrings + critical patterns | Haiku | 
Skips bodies of pure-utility functions |
   | `code_security` | Security-relevant code only (auth, validation, crypto, 
IO) | Haiku | Hot path for ASVS |
   | `audit_findings` | Extracts finding objects from audit reports | Sonnet | 
Generalize from what `asvs_consolidate` does today |
   | `narrative_summary` | Standard text summarization | Haiku | Generic 
fallback |
   
   Users can register their own strategies by adding entries to a small YAML
   config, or by passing the strategy inline as a prompt template. The agent
   reads from `data_store:strategies` or similar.
   
   ## When to promote to platform
   
   After the agent has been in production use for one or two pipelines and
   the pattern is clear, promote *just the cache primitive* to a platform
   feature:
   
   - A `compact:` namespace prefix with first-class semantics
   - Content-addressed keys enforced by the platform (key must equal hash of
     `(content, strategy_version, strategy_name, level)`)
   - Optional TTL so old compactions expire automatically
   - Optional metrics: per-strategy hit rate, average compression ratio,
     total tokens saved
   
   *Then* maybe a `tools.compact()` helper as sugar. *Then* maybe
   auto-compaction in `tools.call_llm()` ("if context exceeds budget, swap in
   the cached compact"). All optional polish on top of the substrate.
   
   The premature version is the one where the platform owns the strategy.
   Don't do that.
   
   ## Three concrete uses driving this
   
   1. **ASVS Opus calls.** `asvs_audit` and `asvs_bundle` would compact-first
      via `code_security`, then call Opus on the compact namespace. Estimated
      win: 70–80% token reduction on the hot path, with the option for Opus
      to drill into raw via tool call when something looks suspicious.
   2. **Consolidation.** `asvs_consolidate` already does finding-extraction;
      that's structurally the same compaction pattern. Generalizing it gives
      consolidation the same tree-of-references behavior other agents would
      have.
   3. **General reasoning over large corpora.** Any agent that wants to
      "read this 5MB documentation and answer questions about it" becomes
      feasible. Today that's prohibitively expensive.
   
   ## Phasing
   
   **Phase 1 — Agent + a few canned strategies.** Build `compactor` agent.
   Ship `code_security` and `narrative_summary` strategies. Wire ASVS audit
   to call it. Measure cost reduction. ~400 LOC agent + strategy configs.
   
   **Phase 2 — Strategy registry as an agent.** A `compactor_admin` agent
   for adding/editing strategies. UI lives in data store configuration
   screens. ~150 LOC.
   
   **Phase 3 — Tree-of-references navigation tool.** A `compact_drilldown`
   tool exposed to LLMs so they can ask "show me the raw for
   `airflow/dags/example.py`" mid-reasoning. Requires LLM tool-calling support
   in the call_llm path. ~250 LOC.
   
   **Phase 4 — Platform promotion (conditional).** Only if Phases 1–3 have
   shown the pattern is right. Promote cache + content-addressed keys to
   first-class platform primitives. Roughly ~300 LOC on the gofannon side.
   
   ## Effort overall
   
   Phase 1 alone is probably one focused two-week sprint with measurable cost
   reduction on the ASVS pipeline. Phases 2–3 add another month. Phase 4 is
   platform-team work after pattern validation.
   
   ## Open questions
   
   1. **Cache invalidation when source changes.** Content hash handles it
      implicitly — different source = different hash = different cache key —
      but old compacts accumulate. TTL? Reference counting? Background GC?
      Recommend TTL (30 days?) as the simplest answer.
   2. **Cross-agent cache sharing.** A compact computed for agent A's
      strategy is reusable by agent B if they use the same strategy. The
      naming scheme already supports this; just make sure strategies have
      human-meaningful names so reuse is visible.
   3. **Quality degradation.** Compaction is lossy by design. Measuring
      whether the expensive model's final output got worse when fed compact
      vs raw is essential before relying on this in anything important. Each
      strategy should ship with an eval set that compares Opus-on-raw vs
      Opus-on-compact for representative inputs.
   4. **Tool-call drill-down vs flat compact.** Phase 1 ships flat (Opus
      gets the compact, never sees raw). Phase 3 adds the drill-down tool.
      Whether the drill-down is worth the complexity depends on Phase 1
      results — if 80% reduction with no quality loss, drill-down may be
      overkill.
   5. **Drill-down or flat for the MVP?** Worth deciding before any code:
      does sbp want compaction-with-drilldown specifically, or is flat
      compaction acceptable for Phase 1? If drilldown is essential, Phase 1
      scope grows. If flat is fine for now, Phase 1 ships faster and we
      learn from real use whether drilldown matters.
   
   ## Why this is high-value
   
   Cost reduction on the hot path is the flywheel: every dollar saved on
   existing pipelines funds more pipelines, more iteration, longer runs.
   ASVS at full ASF coverage is constrained by Bedrock budget right now. A
   70–80% reduction makes "audit every Apache project quarterly" tractable
   instead of aspirational. Other LLM-heavy pipelines that can't get off the
   ground today become feasible with the same change.
   
   ## Proposed next steps
   
   1. Confirm the MVP shape (flat compact vs. drill-down) with sbp.
   2. Spec the `code_security` strategy prompt and the eval set for measuring
      quality preservation on representative ASVS inputs.
   3. Build the `compactor` agent and the two MVP strategies.
   4. Wire ASVS audit/bundle to compact-first, measure cost reduction and
      finding parity vs the current uncompacted runs.
   5. If the numbers hold up, ship strategy 2 and 3 as natural follow-ons.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Context compaction agent — cheap-LLM summarize, expensive-LLM consume (tooling-agents)

Reply via email to