This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 8ba93cc docs(gmail): default get_thread to MINIMAL; escalate to
FULL_CONTENT only on body-parse (#409)
8ba93cc is described below
commit 8ba93ccfe3102b38795d7d837186961bb27f9356
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sun May 31 12:15:16 2026 +0200
docs(gmail): default get_thread to MINIMAL; escalate to FULL_CONTENT only
on body-parse (#409)
Profiling a 12-tracker bulk sync surfaced that Gmail get_thread calls
were responsible for ~170 KB of in-context spillover (top single
spillover file was 440 KB) — a FULL_CONTENT call on a long reporter
conversation routinely lands 60-100K tokens. Most call sites only
needed the chronologically-last message ID for anchoring, the SENT/
DRAFT label state, or the thread-exists check — all of which MINIMAL
satisfies at ~5K tokens.
Changes:
1. tools/gmail/operations.md — flip the default example from
FULL_CONTENT to MINIMAL. Document a clear escalation rule:
default to MINIMAL; escalate to FULL_CONTENT only when the call
site actually processes the message body (Step 3 classification in
security-issue-import, Step 1e reviewer-comment parsing in
security-issue-sync, draft-composition that quotes the reporter's
prior message, credit-form extraction). Also fix a longstanding
terminology error: the doc said 'METADATA' but the MCP enum is
'MINIMAL'.
2. security-issue-sync Step 1e — make the FULL_CONTENT requirement
explicit (reviewer comment bodies ARE the actionable signal).
Reinforce the contrast with the rest of the skill, which defaults
to MINIMAL.
The privacy-LLM contract block was updated accordingly — the
redact-after-fetch protocol applies to FULL_CONTENT calls; MINIMAL
calls are exempt (they carry headers only, no free-form body
content).
Estimated savings: ~80% of Gmail-related tokens on typical bulk runs
(60-100K → ~5K per call). Compounds on every bulk-sync invocation.
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
.claude/skills/security-issue-sync/SKILL.md | 11 +++++-
tools/gmail/operations.md | 58 ++++++++++++++++++++++++-----
2 files changed, 57 insertions(+), 12 deletions(-)
diff --git a/.claude/skills/security-issue-sync/SKILL.md
b/.claude/skills/security-issue-sync/SKILL.md
index 9f5b869..fa7d13f 100644
--- a/.claude/skills/security-issue-sync/SKILL.md
+++ b/.claude/skills/security-issue-sync/SKILL.md
@@ -959,8 +959,15 @@ PMC member). The body usually contains explicit proposals
— *"Please
update the CWE to CWE-NNN"*, *"The affected range should be `< X.Y.Z`"*,
*"Credits are missing a remediation-developer entry"*, etc.
-Read each matching thread **once** with `mcp__claude_ai_Gmail__get_thread`
-to extract the comment bodies verbatim.
+Read each matching thread **once** with
+`mcp__claude_ai_Gmail__get_thread(threadId, messageFormat='FULL_CONTENT')`
+to extract the comment bodies verbatim. This is one of the few
+sync-skill paths that genuinely needs `FULL_CONTENT` — the
+reviewer's body text IS the actionable signal. Per the
+[get-thread default rule](../../../tools/gmail/operations.md#get-thread),
+every other `get_thread` call in this skill defaults to
+`MINIMAL` (state probes, anchor-point lookups, draft-presence
+checks) and only escalates when body parsing is required.
**Fallback when no CVE-review emails are found.** Absence of signal is
the common case — most CVEs go through REVIEW and PUBLISHED with no
diff --git a/tools/gmail/operations.md b/tools/gmail/operations.md
index 5f188cf..8ba394e 100644
--- a/tools/gmail/operations.md
+++ b/tools/gmail/operations.md
@@ -86,20 +86,58 @@ the skills use, see
[`search-queries.md`](search-queries.md).
```text
mcp__claude_ai_Gmail__get_thread(
threadId='<threadId>',
- messageFormat='FULL_CONTENT', # or 'METADATA' when bodies are not needed
+ messageFormat='MINIMAL', # default — see escalation rule below
)
```
-Returns the full message history of a thread. Body reads are
-expensive — most skills filter candidates down on metadata first and
-only fetch bodies for the narrow set that actually warrants it
-(`security-issue-import` does this explicitly at Step 3).
+**Default to `MINIMAL`.** The `MINIMAL` format returns message
+snippets, key headers (`Subject`, `From`, `To`, `Cc`, `Date`), and
+message IDs — enough to:
+
+- pick the chronologically-last message for `replyToMessageId`
+ attachment;
+- detect SENT-by-us vs reporter-replied state on a thread;
+- list draft IDs on the thread (`labelIds` carries `DRAFT`);
+- check whether the thread exists / has any messages at all;
+- read `Subject` for fallback threading when the message ID is lost.
+
+This covers the vast majority of `get_thread` call sites the
+skills make. `FULL_CONTENT` returns the entire conversation
+including HTML body parts — typically 5-20× the byte size of
+`MINIMAL` for a non-trivial thread (a long reporter conversation
+can exceed the in-context token limit and spill to disk).
+
+**Escalate to `FULL_CONTENT` only when the call site actually
+processes the message body.** Concrete cases that need
+`FULL_CONTENT`:
+
+- `security-issue-import` Step 3 — classifies a thread into
+ `Report` / `ASF-security relay` / `automated-scanner` / etc.
+ by scanning the body for forwarding preambles, credit lines,
+ scanner-product tokens.
+- `security-issue-sync` Step 1e — extracts CVE-reviewer asks
+ from review-comment emails on `<security-list>`.
+- Any draft-composition step that quotes the reporter's prior
+ message back to them (e.g. when the operator wants to
+ reference a specific paragraph the reporter wrote).
+- Reading an inbound report's body to extract a credit form
+ the reporter explicitly provided (*"please credit me as X"*).
+
+For everything else — thread-state probes, anchor-point
+lookups, draft-already-exists checks — default to `MINIMAL`
+and avoid the body fetch.
+
+**Cost note.** A 12-tracker bulk sync that calls `get_thread`
+once per tracker for state-anchoring lands around
+~5K tokens of Gmail context on `MINIMAL`; the same call on
+`FULL_CONTENT` typically lands 60-100K tokens. The savings
+compound on every bulk run.
**Privacy-LLM contract — apply to every body read.** Every
-`get_thread` call against a `<security-list>` thread (or any
-`<private-list>` thread, where the approved-LLM gate also
-applies) MUST be followed by the redact-after-fetch protocol
-documented in
+`get_thread(messageFormat='FULL_CONTENT')` call against a
+`<security-list>` thread (or any `<private-list>` thread, where
+the approved-LLM gate also applies) MUST be followed by the
+redact-after-fetch protocol documented in
[`../privacy-llm/wiring.md`](../privacy-llm/wiring.md#redact-after-fetch-protocol)
before the body is used for any further processing. The window
between `get_thread` returning and `pii-redact` running should
@@ -107,7 +145,7 @@ be a single tool invocation wide; the redacted body is what
flows through the rest of the skill. Skills that consume bodies
without running the protocol are framework bugs.
-Skip the protocol on `messageFormat='METADATA'` calls — the
+Skip the protocol on `messageFormat='MINIMAL'` calls — the
returned envelope carries the reporter's `From:` header (which
is not redacted under the contract) and routing fields, no
free-form body content. The protocol applies once an actual