This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 1a30c916 feat(gmail): record root Message-ID in the mailing-thread
tracker field (#519)
1a30c916 is described below
commit 1a30c916a8612465719fe57c80533717ea0d499c
Author: Jarek Potiuk <[email protected]>
AuthorDate: Mon Jun 15 05:04:14 2026 +0200
feat(gmail): record root Message-ID in the mailing-thread tracker field
(#519)
The *Security mailing list thread* field recorded only the Gmail
`threadId` — an identifier that resolves only inside the one mailbox
that holds the thread. Add the inbound report's root RFC-5322
`Message-ID` alongside it: the archive-independent handle the
reporter's MUA stamped and the value PonyMail hashes its permalinks
on, so the message stays locatable from any account / archive.
The claude.ai Gmail MCP `get_thread` does not expose the `Message-ID:`
header (only Subject/From/To/Cc/Date and Gmail's opaque per-message
IDs), so add an `oauth-draft-message-id` console script that fetches
`threads.get?format=metadata&metadataHeaders=Message-ID` via the
existing OAuth flow. On the PonyMail backend the header is already in
the archive result, so no new fetch is needed there.
- tools/gmail/oauth-draft: new `oauth-draft-message-id` script + tests
- tools/gmail/operations.md: document the per-backend resolution path
- security-issue-import: record root Message-ID in the field + body template
- tools/github/issue-template.md: note Message-ID in the security-thread
role
---
skills/security-issue-import/SKILL.md | 3 +-
tools/github/issue-template.md | 2 +-
tools/gmail/oauth-draft/README.md | 10 +-
tools/gmail/oauth-draft/pyproject.toml | 10 +-
.../oauth-draft/src/oauth_draft/message_id.py | 133 ++++++++++++++++
tools/gmail/oauth-draft/tests/test_message_id.py | 170 +++++++++++++++++++++
tools/gmail/operations.md | 44 ++++++
7 files changed, 364 insertions(+), 8 deletions(-)
diff --git a/skills/security-issue-import/SKILL.md
b/skills/security-issue-import/SKILL.md
index 8a329329..c5b19927 100644
--- a/skills/security-issue-import/SKILL.md
+++ b/skills/security-issue-import/SKILL.md
@@ -1169,7 +1169,7 @@ here.
| **The issue description** | The root email body, **verbatim** (preserve
paragraphs, PoC code blocks, and any quoted sections). The body is private —
the triager will copy it into a public CVE description only after Step 13. |
| **Short public summary for publish** | Leave `_No response_`. Filled by the
release manager at Step 13 in sanitised form. |
| **Affected versions** | Extract `<product> <version>` / `>= X, < Y` / `<Y`
phrases from the body (substitute the adopter's product name — e.g. `Airflow
<version>` for the airflow-s adopter). If the reporter gave only a single
version they tested on (e.g. `3.1.5`), record that verbatim; the triager can
widen the range later. Leave `_No response_` if no version is mentioned. |
-| **Security mailing list thread** | **Keep the private thread handle, and —
if possible — also link the PonyMail archive entry.** The full URL-construction
recipe (search URL template, month-token format, user-pastes-back flow,
Gmail-threadId fallback) lives in
[`tools/gmail/ponymail-archive.md`](../../tools/gmail/ponymail-archive.md#use-case--security-issue-import);
the adopting project's private-search URL template is declared in
[`<project-config>/project.md`](../../<project-config>/ [...]
+| **Security mailing list thread** | **Keep the private thread handle, and —
if possible — also link the PonyMail archive entry.** The full URL-construction
recipe (search URL template, month-token format, user-pastes-back flow,
Gmail-threadId fallback) lives in
[`tools/gmail/ponymail-archive.md`](../../tools/gmail/ponymail-archive.md#use-case--security-issue-import);
the adopting project's private-search URL template is declared in
[`<project-config>/project.md`](../../<project-config>/ [...]
| **Public advisory URL** | `_No response_`. Populated at Step 14 by
`security-issue-sync` once the advisory is archived. |
| **Reporter credited as** | The reporter's full display name from the email
`From:` header (e.g. `Alice Example` from `"Alice Example"
<[email protected]>`). This is a **placeholder** — in direct-reporter mode, the
receipt-of-confirmation reply in Step 7 asks the reporter to confirm their
preferred credit form. **Apply the [bot/AI credit
policy](../../tools/cve-tool-vulnogram/bot-credits-policy.md) before
populating** — if the `From:`-header name or address matches the bot detection
rul [...]
| **PR with the fix** | `_No response_`. |
@@ -1620,6 +1620,7 @@ For each confirmed `Report` or forwarder-relayed
candidate:
### Security mailing list thread
No public archive URL — tracked privately on Gmail thread `<threadId>`.
+ Root Message-ID: `<root-message-id>`
### Public advisory URL
diff --git a/tools/github/issue-template.md b/tools/github/issue-template.md
index 6961d5ec..43d35442 100644
--- a/tools/github/issue-template.md
+++ b/tools/github/issue-template.md
@@ -56,7 +56,7 @@ The generic lifecycle refers to fields by these roles:
| `issue-description` | dedupe | import | The verbatim inbound report; private
to the security team. |
| `public-summary` | CVE JSON generator | release manager (Step 13) |
Sanitised one-paragraph public summary for the advisory. |
| `affected-versions` | CVE JSON generator, sync | sync proposes, user
confirms | The `>= X, < Y` range that populates CVE 5.x `affected[]`. |
-| `security-thread` | dedupe, sync (reporter-notification lookup) | import |
Private pointer to the inbound mail thread. **Never** exported to the public
CVE record. |
+| `security-thread` | dedupe, sync (reporter-notification lookup) | import |
Private pointer to the inbound mail thread — the Gmail `threadId`, any PonyMail
archive URL, and the root `Message-ID` (archive-independent message handle;
backtick-wrapped). **Never** exported to the public CVE record. |
| `public-advisory-url` | CVE JSON generator, sync (gates close) | sync (Step
14) | Public archive URL; tagged `vendor-advisory` in `references[]`. |
| `reporter-credit` | CVE JSON generator | import (placeholder), sync (after
reporter confirms) | Credit line as the reporter wants to appear in the public
advisory. |
| `pr-with-fix` | sync, CVE JSON generator | fix, sync | URL of the merged
`<upstream>` PR. |
diff --git a/tools/gmail/oauth-draft/README.md
b/tools/gmail/oauth-draft/README.md
index 14e73a2f..612a8424 100644
--- a/tools/gmail/oauth-draft/README.md
+++ b/tools/gmail/oauth-draft/README.md
@@ -33,13 +33,14 @@
# oauth-draft
Small Python project that talks directly to the Gmail REST API on a
-user-provided OAuth refresh token. Three console scripts:
+user-provided OAuth refresh token. Four console scripts:
| Console script | Purpose |
|---|---|
| `oauth-draft-setup` | One-time interactive OAuth consent flow that writes
the credentials JSON. |
| `oauth-draft-create` | Create a Gmail draft with `threadId` attachment. (As
of the `replyToMessageId` parameter on the claude.ai Gmail MCP `create_draft`,
the MCP can also produce thread-attached drafts — see
[`../draft-backends.md`](../draft-backends.md). This script remains useful when
you have a `threadId` on hand and would rather skip the extra `get_thread`
round-trip the MCP path requires, and is the only path that lets the skills
delete drafts via the Gmail API afterwards.) |
| `oauth-draft-mark-read` | Bulk-modify Gmail threads matching a search query
(default: mark as read by removing the `UNREAD` label). No MCP equivalent
today. |
+| `oauth-draft-message-id` | Resolve the root RFC-5322 `Message-ID` header of
one or more threads (`threads.get?format=metadata`). No MCP equivalent — the
claude.ai Gmail MCP `get_thread` surfaces only Gmail's opaque per-message IDs,
never the `Message-ID:` header. `security-issue-import` records the result in
the *Security mailing list thread* tracker field. |
The **strongly preferred** drafting backend is this `oauth_curl` tool:
the claude.ai Gmail MCP `create_draft` silently rewrites embedded URLs
@@ -85,10 +86,15 @@ uv run --project <framework>/tools/gmail/oauth-draft
oauth-draft-mark-read \
# Add --execute after reviewing the dry-run output
uv run --project <framework>/tools/gmail/oauth-draft oauth-draft-mark-read \
--query 'label:apache-security in:spam is:unread' --execute
+
+# Resolve the root Message-ID of one or more threads (TSV, or --json)
+uv run --project <framework>/tools/gmail/oauth-draft oauth-draft-message-id \
+ 19e9d09a31ff6bdd 19dda947a5d6ca88
```
Per-flag help: `oauth-draft-create --help`,
-`oauth-draft-mark-read --help`, `oauth-draft-setup --help`.
+`oauth-draft-mark-read --help`, `oauth-draft-message-id --help`,
+`oauth-draft-setup --help`.
## Setup — one-time
diff --git a/tools/gmail/oauth-draft/pyproject.toml
b/tools/gmail/oauth-draft/pyproject.toml
index 9a355ac5..556084d3 100644
--- a/tools/gmail/oauth-draft/pyproject.toml
+++ b/tools/gmail/oauth-draft/pyproject.toml
@@ -26,10 +26,11 @@ readme = "README.md"
requires-python = ">=3.11"
license = { text = "Apache-2.0" }
# google-auth-oauthlib is needed only by `oauth-draft-setup` (the one-time
-# interactive consent flow). The `oauth-draft-create` and
-# `oauth-draft-mark-read` commands are stdlib-only. We list the dep at
-# the project level rather than as an extra so the three console scripts
-# can be invoked uniformly via `uv run --project ... oauth-draft-*`.
+# interactive consent flow). The `oauth-draft-create`,
+# `oauth-draft-mark-read`, and `oauth-draft-message-id` commands are
+# stdlib-only. We list the dep at the project level rather than as an
+# extra so the console scripts can be invoked uniformly via
+# `uv run --project ... oauth-draft-*`.
dependencies = [
"google-auth-oauthlib>=1.4.0",
]
@@ -38,6 +39,7 @@ dependencies = [
oauth-draft-setup = "oauth_draft.setup_creds:main"
oauth-draft-create = "oauth_draft.create_draft:main"
oauth-draft-mark-read = "oauth_draft.mark_threads_read:main"
+oauth-draft-message-id = "oauth_draft.message_id:main"
[tool.hatch.build.targets.wheel]
packages = ["src/oauth_draft"]
diff --git a/tools/gmail/oauth-draft/src/oauth_draft/message_id.py
b/tools/gmail/oauth-draft/src/oauth_draft/message_id.py
new file mode 100644
index 00000000..2eae24cb
--- /dev/null
+++ b/tools/gmail/oauth-draft/src/oauth_draft/message_id.py
@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+# https://www.apache.org/licenses/LICENSE-2.0
+"""Resolve the root RFC-5322 ``Message-ID`` header of Gmail thread(s).
+
+Companion to ``create_draft`` / ``mark_threads_read``: same credentials
+file, same OAuth refresh-token flow, same broad ``https://mail.google.com/``
+scope (read is covered).
+
+**Why this command exists.** The claude.ai Gmail MCP ``get_thread`` tool
+only surfaces the *key* headers (``Subject`` / ``From`` / ``To`` / ``Cc`` /
+``Date``) plus Gmail's opaque per-message IDs — it does **not** expose the
+RFC-5322 ``Message-ID:`` header. The skills want the real ``Message-ID``
+because, unlike a Gmail ``threadId`` (which only resolves inside the one
+mailbox that holds the thread), the ``Message-ID`` is archive-independent:
+it is the stable identifier the reporter's MUA stamped, and it is what the
+ASF PonyMail archive hashes its permalinks on. ``security-issue-import``
+records it in the *Security mailing list thread* tracker field so the
+inbound message stays locatable even from an account that never received
+the Gmail copy.
+
+This command fetches ``threads.get?format=metadata&metadataHeaders=Message-ID``
+and prints the ``Message-ID`` of the thread's **root** (chronologically
+first) message — the inbound report.
+
+Usage::
+
+ uv run --project tools/gmail/oauth-draft \
+ oauth-draft-message-id <threadId> [<threadId> ...]
+
+Output is one TSV line per thread: ``<threadId>\\t<message-id>``. A thread
+with no resolvable root header prints ``<threadId>\\t`` (empty) and the
+command still exits 0 — a missing header is a data fact, not a tool error.
+With ``--json`` the same mapping is emitted as a JSON object instead.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+import urllib.error
+import urllib.parse
+import urllib.request
+
+from oauth_draft.credentials import (
+ GMAIL_API,
+ Credentials,
+ locate_credentials,
+ refresh_access_token,
+)
+
+
+def root_message_id(access_token: str, thread_id: str) -> str | None:
+ """Return the ``Message-ID`` header of the thread's root message.
+
+ Returns ``None`` when the thread has no messages or the root message
+ carries no ``Message-ID`` header (both are legitimate states, not
+ errors — e.g. a draft-only thread).
+ """
+ params = {
+ "format": "metadata",
+ "metadataHeaders": "Message-ID",
+ }
+ # urlencode with doseq so repeated metadataHeaders survive if extended.
+ url = f"{GMAIL_API}/threads/{thread_id}?" + urllib.parse.urlencode(params,
doseq=True)
+ req = urllib.request.Request(url, headers={"Authorization": f"Bearer
{access_token}"})
+ try:
+ with urllib.request.urlopen(req, timeout=30) as r:
+ data = json.loads(r.read())
+ except urllib.error.HTTPError as e:
+ raise SystemExit(
+ f"Gmail threads.get failed ({e.code}) for {thread_id}:
{e.read().decode(errors='replace')}"
+ ) from e
+ messages = data.get("messages") or []
+ if not messages:
+ return None
+ headers = messages[0].get("payload", {}).get("headers", [])
+ for h in headers:
+ if h.get("name", "").lower() == "message-id":
+ return h.get("value")
+ return None
+
+
+def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
+ p = argparse.ArgumentParser(
+ prog="oauth-draft-message-id",
+ description=(
+ "Resolve the root RFC-5322 Message-ID header of Gmail thread(s)
via the OAuth refresh-token flow."
+ ),
+ )
+ p.add_argument(
+ "thread_ids",
+ nargs="+",
+ metavar="THREAD_ID",
+ help="One or more Gmail threadId values to resolve.",
+ )
+ p.add_argument(
+ "--json",
+ action="store_true",
+ help="Emit a {threadId: message-id} JSON object instead of TSV lines.",
+ )
+ p.add_argument(
+ "--credentials",
+ default=None,
+ help=(
+ "Override the credentials file path. "
+ "Default: $GMAIL_OAUTH_CREDENTIALS or the packaged default path."
+ ),
+ )
+ return p.parse_args(argv)
+
+
+def main(argv: list[str] | None = None) -> int:
+ args = parse_args(argv)
+ creds_path = locate_credentials(args.credentials)
+ creds = Credentials.load(creds_path, require_from_address=False)
+ access_token = refresh_access_token(creds)
+
+ resolved: dict[str, str | None] = {}
+ for tid in args.thread_ids:
+ resolved[tid] = root_message_id(access_token, tid)
+
+ if args.json:
+ print(json.dumps(resolved, indent=2))
+ else:
+ for tid, mid in resolved.items():
+ sys.stdout.write(f"{tid}\t{mid or ''}\n")
+ return 0
+
+
+if __name__ == "__main__":
+ raise SystemExit(main())
diff --git a/tools/gmail/oauth-draft/tests/test_message_id.py
b/tools/gmail/oauth-draft/tests/test_message_id.py
new file mode 100644
index 00000000..c9ea5f68
--- /dev/null
+++ b/tools/gmail/oauth-draft/tests/test_message_id.py
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import io
+import json
+import urllib.error
+from unittest.mock import patch
+
+import pytest
+
+from oauth_draft.message_id import main, parse_args, root_message_id
+
+
+def test_parse_args_collects_thread_ids():
+ args = parse_args(["t1", "t2", "t3"])
+ assert args.thread_ids == ["t1", "t2", "t3"]
+ assert args.json is False
+
+
+def test_parse_args_requires_at_least_one_thread():
+ with pytest.raises(SystemExit):
+ parse_args([])
+
+
+# --- network-mocked helpers ------------------------------------------------
+
+
+class _FakeResponse:
+ def __init__(self, payload: bytes):
+ self._payload = payload
+
+ def __enter__(self):
+ return self
+
+ def __exit__(self, *exc):
+ return False
+
+ def read(self):
+ return self._payload
+
+
+def _thread_payload(message_id: str | None, *, messages: bool = True) -> bytes:
+ if not messages:
+ return json.dumps({"messages": []}).encode()
+ headers = [{"name": "Subject", "value": "report"}]
+ if message_id is not None:
+ # Mixed-case header name on purpose — lookup is case-insensitive.
+ headers.append({"name": "Message-Id", "value": message_id})
+ return json.dumps({"messages": [{"payload": {"headers":
headers}}]}).encode()
+
+
+def test_root_message_id_extracts_header_case_insensitively():
+ payload = _thread_payload("<[email protected]>")
+ with patch(
+ "oauth_draft.message_id.urllib.request.urlopen",
+ return_value=_FakeResponse(payload),
+ ) as mock_open:
+ mid = root_message_id("token", "tid-1")
+ assert mid == "<[email protected]>"
+ request = mock_open.call_args.args[0]
+ assert "/threads/tid-1" in request.full_url
+ assert "format=metadata" in request.full_url
+ assert "metadataHeaders=Message-ID" in request.full_url
+
+
+def test_root_message_id_uses_first_message_as_root():
+ payload = json.dumps(
+ {
+ "messages": [
+ {"payload": {"headers": [{"name": "Message-ID", "value":
"<root@x>"}]}},
+ {"payload": {"headers": [{"name": "Message-ID", "value":
"<reply@x>"}]}},
+ ]
+ }
+ ).encode()
+ with patch(
+ "oauth_draft.message_id.urllib.request.urlopen",
+ return_value=_FakeResponse(payload),
+ ):
+ assert root_message_id("token", "tid") == "<root@x>"
+
+
+def test_root_message_id_returns_none_when_no_messages():
+ with patch(
+ "oauth_draft.message_id.urllib.request.urlopen",
+ return_value=_FakeResponse(_thread_payload(None, messages=False)),
+ ):
+ assert root_message_id("token", "tid") is None
+
+
+def test_root_message_id_returns_none_when_header_absent():
+ with patch(
+ "oauth_draft.message_id.urllib.request.urlopen",
+ return_value=_FakeResponse(_thread_payload(None)),
+ ):
+ assert root_message_id("token", "tid") is None
+
+
+def test_root_message_id_raises_on_http_error():
+ err = urllib.error.HTTPError(
+ url="https://x",
+ code=404,
+ msg="Not Found",
+ hdrs=None, # type: ignore[arg-type]
+ fp=io.BytesIO(b'{"error": "missing"}'),
+ )
+ with patch(
+ "oauth_draft.message_id.urllib.request.urlopen",
+ side_effect=err,
+ ):
+ with pytest.raises(SystemExit) as excinfo:
+ root_message_id("token", "tid-x")
+ assert "threads.get failed (404) for tid-x" in str(excinfo.value)
+
+
+# --- main ------------------------------------------------------------------
+
+
+def _make_creds_file(tmp_path):
+ p = tmp_path / "creds.json"
+ p.write_text(
+ json.dumps(
+ {
+ "client_id": "cid",
+ "client_secret": "secret",
+ "refresh_token": "refresh",
+ }
+ )
+ )
+ return p
+
+
+def test_main_tsv_output(tmp_path, capsys):
+ creds = _make_creds_file(tmp_path)
+ with (
+ patch("oauth_draft.message_id.refresh_access_token",
return_value="tok"),
+ patch(
+ "oauth_draft.message_id.root_message_id",
+ side_effect=["<a@x>", None],
+ ),
+ ):
+ rc = main(["--credentials", str(creds), "t1", "t2"])
+ assert rc == 0
+ out = capsys.readouterr().out.splitlines()
+ assert out == ["t1\t<a@x>", "t2\t"]
+
+
+def test_main_json_output(tmp_path, capsys):
+ creds = _make_creds_file(tmp_path)
+ with (
+ patch("oauth_draft.message_id.refresh_access_token",
return_value="tok"),
+ patch("oauth_draft.message_id.root_message_id", return_value="<a@x>"),
+ ):
+ rc = main(["--credentials", str(creds), "--json", "t1"])
+ assert rc == 0
+ assert json.loads(capsys.readouterr().out) == {"t1": "<a@x>"}
diff --git a/tools/gmail/operations.md b/tools/gmail/operations.md
index 27a278aa..5942990b 100644
--- a/tools/gmail/operations.md
+++ b/tools/gmail/operations.md
@@ -7,6 +7,7 @@
- [Read](#read)
- [Search threads](#search-threads)
- [Get thread](#get-thread)
+ - [Get the root `Message-ID` of a
thread](#get-the-root-message-id-of-a-thread)
- [Write — drafts only, never send](#write--drafts-only-never-send)
- [Drafting backends](#drafting-backends)
- [Create draft — `claude_ai_mcp`
backend](#create-draft--claude_ai_mcp-backend)
@@ -151,6 +152,49 @@ is not redacted under the contract) and routing fields, no
free-form body content. The protocol applies once an actual
body is fetched.
+### Get the root `Message-ID` of a thread
+
+> [!IMPORTANT]
+> The claude.ai Gmail MCP does **not** expose the RFC-5322
+> `Message-ID:` header. The "message IDs" the `get_thread` envelope
+> returns are Gmail's *opaque per-message IDs* (the value passed to
+> `replyToMessageId`), which only resolve inside the one mailbox that
+> holds the thread. The `Message-ID:` header — the archive-independent
+> identifier the reporter's MUA stamped, and the value the ASF
+> PonyMail archive hashes its permalinks on — is reachable only via
+> the Gmail REST API or the PonyMail archive.
+
+`security-issue-import` records the inbound report's root `Message-ID`
+in the *Security mailing list thread* tracker field (alongside the
+Gmail `threadId` and any PonyMail URL) so the message stays locatable
+even from an account that never received the Gmail copy. Resolve it by
+backend:
+
+- **PonyMail backend (ASF default primary read path).** The archive
+ is keyed on `Message-ID`; `mcp__ponymail__get_email` and
+ `mcp__ponymail__search_list` results carry it directly — no extra
+ fetch needed.
+- **Gmail backend (claude.ai MCP).** Use the `oauth-draft-message-id`
+ console script, which reuses the same OAuth credentials as
+ `oauth-draft-create` and queries
+ `threads.get?format=metadata&metadataHeaders=Message-ID`:
+
+ ```bash
+ uv run --project tools/gmail/oauth-draft \
+ oauth-draft-message-id <threadId> [<threadId> ...]
+ # → one TSV line per thread: <threadId>\t<message-id>
+ # → or --json for a {threadId: message-id} object
+ ```
+
+ See [`oauth-draft/README.md`](oauth-draft/README.md) for setup. The
+ script prints the `Message-ID` of the thread's **root**
+ (chronologically first) message — the inbound report. A thread with
+ no resolvable header prints an empty value and still exits 0.
+
+When recording the value in a tracker field, **backtick-wrap it** —
+a bare `<...@...>` renders as an HTML tag on GitHub and the
+identifier vanishes from the rendered issue.
+
## Write — drafts only, never send
### Drafting backends