(airflow-steward) branch main updated: docs(gmail): default get_thread to MINIMAL; escalate to FULL_CONTENT only on body-parse (#409)

potiuk Sun, 31 May 2026 03:15:31 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git



The following commit(s) were added to refs/heads/main by this push:
     new 8ba93cc  docs(gmail): default get_thread to MINIMAL; escalate to 
FULL_CONTENT only on body-parse (#409)
8ba93cc is described below

commit 8ba93ccfe3102b38795d7d837186961bb27f9356
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sun May 31 12:15:16 2026 +0200

    docs(gmail): default get_thread to MINIMAL; escalate to FULL_CONTENT only 
on body-parse (#409)
    
    Profiling a 12-tracker bulk sync surfaced that Gmail get_thread calls
    were responsible for ~170 KB of in-context spillover (top single
    spillover file was 440 KB) — a FULL_CONTENT call on a long reporter
    conversation routinely lands 60-100K tokens. Most call sites only
    needed the chronologically-last message ID for anchoring, the SENT/
    DRAFT label state, or the thread-exists check — all of which MINIMAL
    satisfies at ~5K tokens.
    
    Changes:
    
    1. tools/gmail/operations.md — flip the default example from
       FULL_CONTENT to MINIMAL. Document a clear escalation rule:
       default to MINIMAL; escalate to FULL_CONTENT only when the call
       site actually processes the message body (Step 3 classification in
       security-issue-import, Step 1e reviewer-comment parsing in
       security-issue-sync, draft-composition that quotes the reporter's
       prior message, credit-form extraction). Also fix a longstanding
       terminology error: the doc said 'METADATA' but the MCP enum is
       'MINIMAL'.
    
    2. security-issue-sync Step 1e — make the FULL_CONTENT requirement
       explicit (reviewer comment bodies ARE the actionable signal).
       Reinforce the contrast with the rest of the skill, which defaults
       to MINIMAL.
    
    The privacy-LLM contract block was updated accordingly — the
    redact-after-fetch protocol applies to FULL_CONTENT calls; MINIMAL
    calls are exempt (they carry headers only, no free-form body
    content).
    
    Estimated savings: ~80% of Gmail-related tokens on typical bulk runs
    (60-100K → ~5K per call). Compounds on every bulk-sync invocation.
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
 .claude/skills/security-issue-sync/SKILL.md | 11 +++++-
 tools/gmail/operations.md                   | 58 ++++++++++++++++++++++++-----
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/.claude/skills/security-issue-sync/SKILL.md 
b/.claude/skills/security-issue-sync/SKILL.md
index 9f5b869..fa7d13f 100644
--- a/.claude/skills/security-issue-sync/SKILL.md
+++ b/.claude/skills/security-issue-sync/SKILL.md
@@ -959,8 +959,15 @@ PMC member). The body usually contains explicit proposals 
— *"Please
 update the CWE to CWE-NNN"*, *"The affected range should be `< X.Y.Z`"*,
 *"Credits are missing a remediation-developer entry"*, etc.
 
-Read each matching thread **once** with `mcp__claude_ai_Gmail__get_thread`
-to extract the comment bodies verbatim.
+Read each matching thread **once** with
+`mcp__claude_ai_Gmail__get_thread(threadId, messageFormat='FULL_CONTENT')`
+to extract the comment bodies verbatim. This is one of the few
+sync-skill paths that genuinely needs `FULL_CONTENT` — the
+reviewer's body text IS the actionable signal. Per the
+[get-thread default rule](../../../tools/gmail/operations.md#get-thread),
+every other `get_thread` call in this skill defaults to
+`MINIMAL` (state probes, anchor-point lookups, draft-presence
+checks) and only escalates when body parsing is required.
 
 **Fallback when no CVE-review emails are found.** Absence of signal is
 the common case — most CVEs go through REVIEW and PUBLISHED with no
diff --git a/tools/gmail/operations.md b/tools/gmail/operations.md
index 5f188cf..8ba394e 100644
--- a/tools/gmail/operations.md
+++ b/tools/gmail/operations.md
@@ -86,20 +86,58 @@ the skills use, see 
[`search-queries.md`](search-queries.md).
 ```text
 mcp__claude_ai_Gmail__get_thread(
   threadId='<threadId>',
-  messageFormat='FULL_CONTENT',   # or 'METADATA' when bodies are not needed
+  messageFormat='MINIMAL',        # default — see escalation rule below
 )
 ```
 
-Returns the full message history of a thread. Body reads are
-expensive — most skills filter candidates down on metadata first and
-only fetch bodies for the narrow set that actually warrants it
-(`security-issue-import` does this explicitly at Step 3).
+**Default to `MINIMAL`.** The `MINIMAL` format returns message
+snippets, key headers (`Subject`, `From`, `To`, `Cc`, `Date`), and
+message IDs — enough to:
+
+- pick the chronologically-last message for `replyToMessageId`
+  attachment;
+- detect SENT-by-us vs reporter-replied state on a thread;
+- list draft IDs on the thread (`labelIds` carries `DRAFT`);
+- check whether the thread exists / has any messages at all;
+- read `Subject` for fallback threading when the message ID is lost.
+
+This covers the vast majority of `get_thread` call sites the
+skills make. `FULL_CONTENT` returns the entire conversation
+including HTML body parts — typically 5-20× the byte size of
+`MINIMAL` for a non-trivial thread (a long reporter conversation
+can exceed the in-context token limit and spill to disk).
+
+**Escalate to `FULL_CONTENT` only when the call site actually
+processes the message body.** Concrete cases that need
+`FULL_CONTENT`:
+
+- `security-issue-import` Step 3 — classifies a thread into
+  `Report` / `ASF-security relay` / `automated-scanner` / etc.
+  by scanning the body for forwarding preambles, credit lines,
+  scanner-product tokens.
+- `security-issue-sync` Step 1e — extracts CVE-reviewer asks
+  from review-comment emails on `<security-list>`.
+- Any draft-composition step that quotes the reporter's prior
+  message back to them (e.g. when the operator wants to
+  reference a specific paragraph the reporter wrote).
+- Reading an inbound report's body to extract a credit form
+  the reporter explicitly provided (*"please credit me as X"*).
+
+For everything else — thread-state probes, anchor-point
+lookups, draft-already-exists checks — default to `MINIMAL`
+and avoid the body fetch.
+
+**Cost note.** A 12-tracker bulk sync that calls `get_thread`
+once per tracker for state-anchoring lands around
+~5K tokens of Gmail context on `MINIMAL`; the same call on
+`FULL_CONTENT` typically lands 60-100K tokens. The savings
+compound on every bulk run.
 
 **Privacy-LLM contract — apply to every body read.** Every
-`get_thread` call against a `<security-list>` thread (or any
-`<private-list>` thread, where the approved-LLM gate also
-applies) MUST be followed by the redact-after-fetch protocol
-documented in
+`get_thread(messageFormat='FULL_CONTENT')` call against a
+`<security-list>` thread (or any `<private-list>` thread, where
+the approved-LLM gate also applies) MUST be followed by the
+redact-after-fetch protocol documented in
 
[`../privacy-llm/wiring.md`](../privacy-llm/wiring.md#redact-after-fetch-protocol)
 before the body is used for any further processing. The window
 between `get_thread` returning and `pii-redact` running should
@@ -107,7 +145,7 @@ be a single tool invocation wide; the redacted body is what
 flows through the rest of the skill. Skills that consume bodies
 without running the protocol are framework bugs.
 
-Skip the protocol on `messageFormat='METADATA'` calls — the
+Skip the protocol on `messageFormat='MINIMAL'` calls — the
 returned envelope carries the reporter's `From:` header (which
 is not redacted under the contract) and routing fields, no
 free-form body content. The protocol applies once an actual

(airflow-steward) branch main updated: docs(gmail): default get_thread to MINIMAL; escalate to FULL_CONTENT only on body-parse (#409)

Reply via email to