This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new d294802  docs(mode-economics): fix internal inconsistency and 
amortisation basis (#326)
d294802 is described below

commit d2948021b6796135bff11cee5f74fa7ca55a69f4
Author: André Ahlert <[email protected]>
AuthorDate: Tue May 26 18:55:34 2026 -0300

    docs(mode-economics): fix internal inconsistency and amortisation basis 
(#326)
    
    Three small corrections to docs/mode-economics.md, all confined to the
    existing "Reducing costs" and "Local and self-hosted inference"
    sections — no new claims, only tightening claims already on the page.
    
    1. Cache section referenced "the skill file (3 000–6 000 tokens)" as
       the ideal cache candidate. That figure is the pre-correction anchor
       the PR description of #253 explicitly called out as wrong; the
       "What 'tokens' means here" table on the same page now reports
       measured ranges (small ~1k–3k, typical ~3.5k–9k median ~5.3k,
       large security skills ~11k–36k). Replace the inline figure with a
       pointer to the corrected anchor so the doc no longer contradicts
       itself.
    
    2. While that paragraph is open: add a one-sentence TTL caveat to the
       cache recommendation. Anthropic's prompt cache TTL is 5 min default
       (1 h extended at higher write cost), so the "first invocation pays;
       subsequent invocations cheap" pattern is real for bursty same-session
       workloads but typically misses for periodic triage / mentor replies
       spaced through a day — exactly the workloads the same page lists
       above. Flagging the constraint avoids the footgun.
    
    3. Local-inference table listed "~$0.10–0.50/hr amortised" with no
       amortisation basis. "Amortised" needs a denominator to be
       interpretable. Inline the assumption (capex over ~3 yr lifespan ×
       moderate utilisation) so a reader can sanity-check the range against
       their own hardware and utilisation profile.
    
    No table structure changes, no methodology changes, no new rows. Page
    still does not carry a measurement-date / coverage / tokenizer banner —
    those are separate, broader concerns better raised as an issue.
---
 docs/mode-economics.md | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/docs/mode-economics.md b/docs/mode-economics.md
index b5ee098..3bc837c 100644
--- a/docs/mode-economics.md
+++ b/docs/mode-economics.md
@@ -201,7 +201,7 @@ per-token billing to hardware:
 
 | Inference path | Per-token cost | Typical hardware cost | Notes |
 |---|---|---|---|
-| Consumer GPU, Small-class quantised model | $0 | ~$0.10–0.50/hr amortised | 
Viable for Triage and short Mentoring/Drafting |
+| Consumer GPU, Small-class quantised model | $0 | ~$0.10–0.50/hr (capex 
amortised over ~3 yr lifespan × moderate utilisation) | Viable for Triage and 
short Mentoring/Drafting |
 | Cloud spot GPU, Mid-tier model | $0 | ~$1–4/hr depending on GPU class | 
Viable for all modes; latency is higher than hosted APIs |
 | CPU-only, quantised Small model | $0 | Near-zero | Very slow; not 
recommended for interactive Pairing |
 
@@ -223,9 +223,14 @@ paths use identical skill code to hosted paths.
    skill read only what is relevant.
 
 3. **Cache skill context.** Most agent CLIs support prompt-level
-   caching. The skill file (3 000–6 000 tokens) and stable project
-   configuration files are ideal cache candidates — the first invocation
-   pays; subsequent invocations are cheap on the cached portion.
+   caching. The skill file (size varies by skill class; see
+   [What "tokens" means here](#what-tokens-means-here)) and stable
+   project configuration files are ideal cache candidates — the first
+   invocation pays; subsequent invocations are cheap on the cached
+   portion. Note: most provider caches have a short TTL (Anthropic
+   prompt cache: 5 min default, 1 h extended at higher write cost),
+   so bursty same-session workloads benefit most; periodic triage runs
+   spaced hours apart will typically miss the cache.
 
 4. **Batch triage.** `issue-reassess` and `pr-management-stats`
    amortise context load across a pool. Running them weekly rather than

Reply via email to