This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new e7c77bb feat(egress-gateway): add egress-allowlist proxy; document in
RFC-AI-0003 (#429)
e7c77bb is described below
commit e7c77bb2cd29720e7779af6929ca17215230b5bd
Author: Jarek Potiuk <[email protected]>
AuthorDate: Mon Jun 1 18:28:28 2026 +0200
feat(egress-gateway): add egress-allowlist proxy; document in RFC-AI-0003
(#429)
Adds tools/egress-gateway/ — a local host-allowlisting HTTP(S) forward
proxy (a proxy.py plugin) that constrains where framework tools may send
data. It is the network-layer egress-control chokepoint: tools point
HTTPS_PROXY at it, and any CONNECT/request to a host not on the allowlist
is rejected with 403 before a socket is opened.
Why: RFC-AI-0003's two mechanisms (PII redactor + approved-LLM gate) act
at the application layer — they bound what a skill deliberately sends to
an LLM. Neither stops an unintended outbound flow (a buggy tool, or a
prompt-injection payload coaxing the agent into curling private data out)
— the gap docs/setup/secure-agent-setup.md flags for Bash(curl *) egress.
The gateway closes it at the network layer as defence-in-depth, layered
under the two mechanisms, never a replacement.
Design:
- Default-deny allowlist mirrors sandbox.network.allowedDomains (ASF infra,
GitHub, Google APIs, PyPI), suffix-matched; loopback always allowed;
adopters extend via EGRESS_ALLOW_EXTRA without editing code.
- Host-level only (HTTPS via CONNECT, no TLS interception / payload
inspection) — the right model for egress control without MITM.
- host_allowed() is a pure function with 28 unit tests (IPv6, port/dot
normalisation, suffix-spoof rejection, env-extra parsing). proxy.py
integration is not exercised in CI (needs to bind a port).
- Separate tool (not a privacy-llm sub-tool) because it carries a
third-party runtime dep (proxy.py); the privacy-llm sub-tools are
stdlib-only by contract.
RFC-AI-0003 updated: abstract note + §4.4 (Mechanism 3, defence-in-depth)
+ §6.4 (implementation) + §10.6 (wiring follow-ups) + references. Tool
registered in docs/labels-and-capabilities.md (capability:setup) and the
uv workspace.
The mechanism is optional and provisional: it ships as a documented,
tested tool but is not yet wired into a setup skill or privacy-llm-check;
§10.6 tracks that follow-up.
Generated-by: Claude Code (Claude Opus 4.8)
---
docs/labels-and-capabilities.md | 1 +
docs/rfcs/RFC-AI-0003.md | 31 +++++
pyproject.toml | 1 +
tools/egress-gateway/.gitignore | 7 +
tools/egress-gateway/README.md | 97 +++++++++++++
tools/egress-gateway/pyproject.toml | 73 ++++++++++
.../egress-gateway/src/egress_gateway/__init__.py | 21 +++
.../egress-gateway/src/egress_gateway/allowlist.py | 154 +++++++++++++++++++++
tools/egress-gateway/src/egress_gateway/cli.py | 74 ++++++++++
tools/egress-gateway/tests/__init__.py | 16 +++
tools/egress-gateway/tests/test_allowlist.py | 98 +++++++++++++
tools/egress-gateway/tool.md | 105 ++++++++++++++
uv.lock | 35 +++++
13 files changed, 713 insertions(+)
diff --git a/docs/labels-and-capabilities.md b/docs/labels-and-capabilities.md
index d630203..7b15042 100644
--- a/docs/labels-and-capabilities.md
+++ b/docs/labels-and-capabilities.md
@@ -181,6 +181,7 @@ Tools under [`tools/`](../tools/). Tools with two values
(separated by
| [`tools/cve-tool-vulnogram`](../tools/cve-tool-vulnogram/) |
`capability:resolve` | ASF Vulnogram CVE-allocation adapter. Implements the
`tools/cve-tool/` contract. Previously named `tools/vulnogram/`. |
| [`tools/dashboard-generator`](../tools/dashboard-generator/) |
`capability:stats` | Self-contained HTML dashboard generator |
| [`tools/dev`](../tools/dev/) | `capability:setup` | Framework dev-loop
helpers |
+| [`tools/egress-gateway`](../tools/egress-gateway/) | `capability:setup` |
Egress-allowlist forward proxy (proxy.py plugin); host-level egress chokepoint
— defence-in-depth for RFC-AI-0003 §4.4 |
| [`tools/forwarder-relay`](../tools/forwarder-relay/) | `capability:setup` |
Adapter contract for inbound-relay backends (ASF Security relay, huntr.com,
HackerOne triagers). Pure interface spec; adapters declare detection +
credit-extraction + reporter-addressing rules. |
| [`tools/github`](../tools/github/) | `capability:setup` | GitHub REST /
GraphQL substrate (called by every lifecycle phase — pure substrate, no single
phase) |
| [`tools/github-body-field`](../tools/github-body-field/) |
`capability:setup` | Read or rewrite one `### Field` section of a GitHub issue
body without bringing the body into agent context — substrate helper for the
security-sync skills |
diff --git a/docs/rfcs/RFC-AI-0003.md b/docs/rfcs/RFC-AI-0003.md
index 76edc85..cc20f58 100644
--- a/docs/rfcs/RFC-AI-0003.md
+++ b/docs/rfcs/RFC-AI-0003.md
@@ -13,11 +13,13 @@
- [4.1 The two mechanisms at a glance](#41-the-two-mechanisms-at-a-glance)
- [4.2 Mechanism 1 — PII redactor](#42-mechanism-1--pii-redactor)
- [4.3 Mechanism 2 — approved-LLM gate](#43-mechanism-2--approved-llm-gate)
+ - [4.4 Mechanism 3 (defence-in-depth) — egress-allowlist
gateway](#44-mechanism-3-defence-in-depth--egress-allowlist-gateway)
- [5. Data flow](#5-data-flow)
- [6. Implementation](#6-implementation)
- [6.1 The redactor sub-tool —
`tools/privacy-llm/redactor/`](#61-the-redactor-sub-tool--toolsprivacy-llmredactor)
- [6.2 The checker sub-tool — `tools/privacy-llm/checker/` (PR
#51)](#62-the-checker-sub-tool--toolsprivacy-llmchecker-pr-51)
- [6.3 What never reaches any LLM](#63-what-never-reaches-any-llm)
+ - [6.4 The egress gateway —
`tools/egress-gateway/`](#64-the-egress-gateway--toolsegress-gateway)
- [7. Adopter configuration](#7-adopter-configuration)
- [8. Skill wiring summary](#8-skill-wiring-summary)
- [9. Trust boundaries and status](#9-trust-boundaries-and-status)
@@ -27,6 +29,7 @@
- [10.3 MCP-layer hooks](#103-mcp-layer-hooks)
- [10.4 Mapping-file lifecycle tools](#104-mapping-file-lifecycle-tools)
- [10.5 Doc-cleanup follow-up](#105-doc-cleanup-follow-up)
+ - [10.6 Egress-gateway wiring](#106-egress-gateway-wiring)
- [11. References](#11-references)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
@@ -74,6 +77,8 @@ The reporter's own identity flows through the agent's context
as-is, by design
Both mechanisms are now landed: the redactor (PR #48 + PR #50) and the
gate-check (PR #51). The full design is shipped, end-to-end, behind explicit
Step 0 pre-flight calls in every `<security-list>`-touching skill.
+**Complementary network-layer control.** The two mechanisms above operate at
the application layer — they decide what a skill deliberately *sends* to an
LLM. They do not, by themselves, stop private data from leaving over an
arbitrary HTTP call (a buggy tool, or a prompt-injection payload that coaxes
the agent into exfiltration). §4.4 adds an optional **egress-allowlist
gateway** (`tools/egress-gateway/`) as defence-in-depth: a default-deny host
allowlist that funnels all tool egress thr [...]
+
## 2. Background and motivation
ASF security work routinely handles two kinds of private content:
@@ -230,6 +235,23 @@ The check is deliberately conservative: any single
unapproved entry stops the sk
**Defence-in-depth:** the gate-check is **also required** for
`<security-list>`-only skills, even though their body classification permits
Claude-Code-default LLMs by construction. Running the check at Step 0 ensures
the adopter's config is in a sane state — no half-configured opt-in entries, no
LLMs in the active stack the adopter forgot to approve — before any private
content flows. The `--reads-private-list` flag controls only the printed
banner; the validation logic is the same either way.
+### 4.4 Mechanism 3 (defence-in-depth) — egress-allowlist gateway
+
+The PII redactor and approved-LLM gate both operate at the application layer:
they constrain what a skill deliberately sends to an LLM. Neither stops an
*unintended* outbound flow — a buggy skill, a mis-wired tool, or a
prompt-injection payload hidden in an inbound report that coaxes the agent into
`curl`-ing private data to an attacker-controlled host.
[`docs/setup/secure-agent-setup.md`](https://github.com/apache/airflow-steward/blob/main/docs/setup/secure-agent-setup.md)
flags exactly [...]
+
+The egress-allowlist gateway closes that gap at the network layer. It is a
local `proxy.py` forward proxy (shipped as
[`tools/egress-gateway/`](../../tools/egress-gateway/)) that enforces a
**default-deny host allowlist** in its `before_upstream_connection` hook: any
CONNECT / request to a host not on the allowlist is rejected with `403` before
a socket is opened. Tools point `HTTPS_PROXY` / `HTTP_PROXY` at it; Python
`urllib`-based tools (ponymail, whimsy, jira, …) honour that with no c [...]
+
+| Property | Value |
+|---|---|
+| Layer | Network egress (host-level), below the application-layer LLM
controls |
+| Policy | Default-deny; allowlist mirrors `sandbox.network.allowedDomains`
(ASF infra, GitHub, Google APIs, PyPI), suffix-matched; loopback always
allowed; adopter extends via `EGRESS_ALLOW_EXTRA` |
+| Granularity | Host only — HTTPS is tunnelled via CONNECT, so no URL-path or
payload inspection (no TLS interception) |
+| Relationship | Defence-in-depth. Layered *under* mechanisms 1 + 2, never a
replacement: the redactor still strips third-party PII, the gate still bounds
which LLM may receive a body, and the gateway additionally bounds which host
*any* tool may reach. |
+
+The gateway runs **outside the sandbox** — it must bind a listener and make
unrestricted outbound, which is precisely its job as the chokepoint. Sandboxed
tools reach it over loopback, which requires `localhost` / `127.0.0.1` in
`sandbox.network.allowedDomains` (loopback-only; this does not widen the
internet egress surface — that becomes the gateway's responsibility). The
gateway's allowlist and `sandbox.network.allowedDomains` encode the same egress
policy at two layers and should be k [...]
+
+This mechanism is **optional and provisional**: it ships as a tool with a
documented contract and unit-tested allowlist policy, but it is not yet wired
into a setup skill or the `privacy-llm-check` gate. See §10.6.
+
## 5. Data flow
```text
@@ -343,6 +365,10 @@ The framework treats these surfaces as off-limits to LLM
context, even when an "
- The `--field <type>:<value>` arguments themselves. Every value passed there
is exactly what the redactor is replacing.
- Any draft text *before* `pii-reveal` runs, when the destination is a
non-internal surface (e.g. a public PR comment) — the body would still carry
identifiers, which leak no PII, but skills should not emit identifier-laden
drafts to non-internal destinations by accident. The destination check in the
approved-LLM gate is a separate safety net for this.
+### 6.4 The egress gateway — `tools/egress-gateway/`
+
+A `proxy.py`-based forward proxy whose only first-party code is the allowlist
plugin (`egress_gateway.allowlist.EgressAllowlistPlugin`). The host-matching
policy (`host_allowed`) is a pure function, unit-tested in isolation; the
proxy.py integration is intentionally not exercised in CI (it needs to bind a
port). Unlike the stdlib-only `privacy-llm` sub-tools, this one carries a
third-party runtime dependency (`proxy.py`) — which is why it is a separate
tool rather than a `privacy-llm` su [...]
+
## 7. Adopter configuration
Adopters declare their privacy-LLM posture in a single markdown file at
`<project-config>/privacy-llm.md` (template at
[`projects/_template/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/projects/_template/privacy-llm.md)).
The file has four sections:
@@ -416,6 +442,10 @@ The framework currently does not ship a cleanup tool for
the mapping file. Manua
A small handful of references in
[`docs/setup/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/docs/setup/privacy-llm.md)
still describe `privacy-llm-check` as "PR-3" pending. Now that PR #51 has
merged, those should be cleaned up to drop the "(PR-3)" phrasing — minor doc
churn, no contract change. Filed as a follow-up for the next cleanup PR.
+### 10.6 Egress-gateway wiring
+
+The egress-allowlist gateway (§4.4,
[`tools/egress-gateway/`](../../tools/egress-gateway/)) ships as a tool with a
documented contract but is not yet wired into the setup flow. Possible
follow-ups: a `setup-isolated-setup-*` step that launches / health-checks the
gateway and persists `HTTPS_PROXY` into the adopter's per-machine settings;
sourcing the gateway allowlist directly from `sandbox.network.allowedDomains`
so the two cannot drift; and a `privacy-llm-check`-style assertion that th [...]
+
## 11. References
- **Source-of-truth contracts**
@@ -428,6 +458,7 @@ A small handful of references in
[`docs/setup/privacy-llm.md`](https://github.c
- **Reference implementation**
-
[`tools/privacy-llm/redactor/`](https://github.com/apache/airflow-steward/tree/main/tools/privacy-llm/redactor)
— PII redactor (stdlib-only Python)
-
[`tools/privacy-llm/checker/`](https://github.com/apache/airflow-steward/tree/main/tools/privacy-llm/checker)
— approved-LLM gate-check (stdlib-only Python)
+ - [`tools/egress-gateway/`](../../tools/egress-gateway/) — egress-allowlist
forward proxy (proxy.py plugin; defence-in-depth, §4.4)
- **Adopter template**
-
[`projects/_template/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/projects/_template/privacy-llm.md)
- **Related framework rules**
diff --git a/pyproject.toml b/pyproject.toml
index e4ad7d8..b6c7c88 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -95,6 +95,7 @@ package = false
[tool.uv.workspace]
members = [
"tools/agent-isolation",
+ "tools/egress-gateway",
"tools/cve-tool-vulnogram/generate-cve-json",
"tools/cve-tool-vulnogram/oauth-api",
"tools/github-body-field",
diff --git a/tools/egress-gateway/.gitignore b/tools/egress-gateway/.gitignore
new file mode 100644
index 0000000..8cae406
--- /dev/null
+++ b/tools/egress-gateway/.gitignore
@@ -0,0 +1,7 @@
+__pycache__/
+*.py[cod]
+.venv/
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.coverage
diff --git a/tools/egress-gateway/README.md b/tools/egress-gateway/README.md
new file mode 100644
index 0000000..847d1ca
--- /dev/null
+++ b/tools/egress-gateway/README.md
@@ -0,0 +1,97 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents** *generated with
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [egress-gateway](#egress-gateway)
+ - [Run it](#run-it)
+ - [Point tools at it](#point-tools-at-it)
+ - [The allowlist](#the-allowlist)
+ - [Test](#test)
+ - [Caveat — host-level, not
payload-level](#caveat--host-level-not-payload-level)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/legal/release-policy.html -->
+
+# egress-gateway
+
+**Capability:** capability:setup
+
+A local **host-allowlisting HTTP(S) forward proxy** for the
+apache-steward framework. It is the egress-control chokepoint: framework
+tools point `HTTPS_PROXY`/`HTTP_PROXY` at it, and the gateway rejects any
+connection to a host that is not on its allowlist — before a socket is
+opened. This is defence-in-depth for
+[RFC-AI-0003](../../docs/rfcs/RFC-AI-0003.md): even if a skill or a
+prompt-injection tries to send private data to an arbitrary endpoint, the
+destination is blocked.
+
+The contract (what/why) lives in [`tool.md`](tool.md); this file is the
+how-to.
+
+## Run it
+
+```bash
+# From a context that is NOT sandboxed — binding a listen socket and
+# making unrestricted outbound is exactly this process's job.
+uv run --project tools/egress-gateway egress-gateway # 127.0.0.1:8899
+uv run --project tools/egress-gateway egress-gateway --port 9000
+```
+
+proxy.py keeps runtime state under `$HOME/.proxy`; if HOME is not writable
+in your environment, point it somewhere writable for this process:
+
+```bash
+HOME=/tmp/egress-home uv run --project tools/egress-gateway egress-gateway
+```
+
+## Point tools at it
+
+```bash
+export HTTPS_PROXY=http://127.0.0.1:8899
+export HTTP_PROXY=http://127.0.0.1:8899
+export NO_PROXY=localhost,127.0.0.1
+```
+
+Every framework tool that uses Python `urllib` (ponymail, whimsy, jira, …)
+honours these automatically — no code change. Persist them per-machine in
+`.claude/settings.local.json`'s `env` block (never committed — the gateway
+is a local process).
+
+**Sandbox interaction:** a *sandboxed* process can only reach the loopback
+proxy if `localhost`/`127.0.0.1` are in `sandbox.network.allowedDomains`
+(see
[`docs/setup/sandbox-troubleshooting.md`](../../docs/setup/sandbox-troubleshooting.md)
+→ *cannot bind to a localhost port*). Adding them is loopback-only and does
+not widen the internet egress surface — that is now the gateway's job.
+
+## The allowlist
+
+Defaults mirror the sandbox's curated `sandbox.network.allowedDomains`
+(ASF infra, GitHub, Google APIs, PyPI), suffix-matched so every
+`*.apache.org` project site is covered. Extend without editing code:
+
+```bash
+EGRESS_ALLOW_EXTRA="bedrock.example.com,.internal.corp" \
+ uv run --project tools/egress-gateway egress-gateway
+```
+
+Entries starting with `.` are treated as suffixes; everything else is an
+exact host. Loopback (`localhost`, `127.0.0.1`, `::1`) is always allowed.
+
+## Test
+
+```bash
+uv run --project tools/egress-gateway --group dev pytest
+```
+
+The allowlist policy (`host_allowed`) is a pure function and is unit-tested
+directly; the proxy.py integration is intentionally not exercised in CI
+(it needs to bind a port).
+
+## Caveat — host-level, not payload-level
+
+The gateway tunnels HTTPS via `CONNECT`; it allow/denies by **host**, not by
+URL path or body. There is no TLS interception, so it cannot inspect request
+payloads. That is the right model for egress *control* without MITM. If you
+need per-path or content-level filtering, that is a different (heavier) tool.
diff --git a/tools/egress-gateway/pyproject.toml
b/tools/egress-gateway/pyproject.toml
new file mode 100644
index 0000000..3c3bfef
--- /dev/null
+++ b/tools/egress-gateway/pyproject.toml
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "egress-gateway"
+version = "0.1.0"
+description = "Local egress-allowlist proxy for the apache-steward framework —
a host-allowlisting HTTP(S) forward proxy that constrains where framework tools
may send data (defence-in-depth for RFC-AI-0003)."
+readme = "README.md"
+requires-python = ">=3.11"
+license = { text = "Apache-2.0" }
+# proxy.py supplies the forward-proxy runtime; the allowlist policy is
+# the only first-party code (a proxy.py plugin). Pinned to the 2.4.x
+# line that ships the HttpProxyBasePlugin.before_upstream_connection
+# hook this plugin relies on.
+dependencies = [
+ "proxy.py>=2.4,<3",
+]
+
+[project.scripts]
+egress-gateway = "egress_gateway.cli:main"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/egress_gateway"]
+
+[tool.ruff]
+line-length = 110
+target-version = "py311"
+src = ["src", "tests"]
+
+[tool.ruff.lint]
+select = [
+ "E", # pycodestyle errors
+ "W", # pycodestyle warnings
+ "F", # pyflakes
+ "I", # isort
+ "B", # flake8-bugbear
+ "UP", # pyupgrade
+ "SIM", # flake8-simplify
+ "C4", # flake8-comprehensions
+ "RUF", # ruff-specific
+]
+ignore = [
+ "E501", # line-too-long — the 110-char limit above is already generous
+]
+
+[tool.pytest.ini_options]
+minversion = "8.0"
+addopts = "-ra -q"
+testpaths = ["tests"]
+
+[dependency-groups]
+dev = [
+ "pytest>=8.0",
+ "ruff>=0.6",
+ "mypy>=1.11",
+]
diff --git a/tools/egress-gateway/src/egress_gateway/__init__.py
b/tools/egress-gateway/src/egress_gateway/__init__.py
new file mode 100644
index 0000000..d49af32
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/__init__.py
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Local egress-allowlist proxy for the apache-steward framework."""
+
+from egress_gateway.allowlist import host_allowed
+
+__all__ = ["host_allowed"]
diff --git a/tools/egress-gateway/src/egress_gateway/allowlist.py
b/tools/egress-gateway/src/egress_gateway/allowlist.py
new file mode 100644
index 0000000..c1dbcc7
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/allowlist.py
@@ -0,0 +1,154 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Egress host-allowlist policy + the proxy.py plugin that enforces it.
+
+The host-matching policy (:func:`host_allowed`) is a pure function with no
+third-party imports, so it is cheap to unit-test in isolation. The
+:class:`EgressAllowlistPlugin` wires that policy into proxy.py's
+``before_upstream_connection`` hook: any CONNECT / request whose target host
+is not on the allowlist is rejected with a 403 before any upstream socket is
+opened. The gateway — not the sandbox — becomes the egress-control point.
+
+The default allowlist mirrors the curated host set the framework's secure
+sandbox already trusts (`sandbox.network.allowedDomains`): ASF infra, GitHub,
+Google APIs, PyPI. Adopters extend it without editing code via the
+``EGRESS_ALLOW_EXTRA`` environment variable (comma-separated hosts; a leading
+dot means "this suffix and all sub-hosts").
+"""
+
+from __future__ import annotations
+
+import os
+
+# Exact hostnames that do not fall under an allowed suffix.
+ALLOW_EXACT: frozenset[str] = frozenset(
+ {
+ "github.com",
+ "api.github.com",
+ "pypi.org",
+ "docs.google.com",
+ "nvd.nist.gov",
+ "cve.org",
+ "www.cve.org",
+ "cveawg.mitre.org",
+ "issues.apache.org",
+ }
+)
+
+# Any host ending in one of these suffixes is allowed.
+ALLOW_SUFFIXES: tuple[str, ...] = (
+ ".apache.org", # whimsy, lists, projects, issues, every project site
+ ".googleapis.com", # sheets / gmail / oauth2
+ ".githubusercontent.com", # raw / objects / codeload
+ ".pythonhosted.org", # uv / pip wheel downloads
+)
+
+# Loopback is always allowed — local inference endpoints (Ollama/vLLM) and
+# local fixtures never leave the host.
+ALLOW_LOOPBACK: frozenset[str] = frozenset({"localhost", "127.0.0.1", "::1"})
+
+_ENV_EXTRA = "EGRESS_ALLOW_EXTRA"
+
+
+def _parse_extra(raw: str | None) -> tuple[frozenset[str], tuple[str, ...]]:
+ """Split EGRESS_ALLOW_EXTRA into (exact-hosts, suffixes).
+
+ Entries starting with '.' are treated as suffixes; everything else is an
+ exact host. Whitespace and empty entries are ignored.
+ """
+ exact: set[str] = set()
+ suffixes: list[str] = []
+ for entry in (raw or "").split(","):
+ token = entry.strip().lower()
+ if token.startswith("."):
+ host = token.strip(".")
+ if host:
+ suffixes.append("." + host)
+ else:
+ host = token.rstrip(".")
+ if host:
+ exact.add(host)
+ return frozenset(exact), tuple(suffixes)
+
+
+def host_allowed(
+ host: str,
+ *,
+ extra_exact: frozenset[str] | None = None,
+ extra_suffixes: tuple[str, ...] | None = None,
+) -> bool:
+ """Return True if *host* is permitted egress.
+
+ *host* may include a ``:port`` suffix and trailing dot; both are
+ normalised away. Bare and bracketed IPv6 literals (``::1``, ``[::1]:443``)
+ are handled without mangling. Matching is case-insensitive.
+ """
+ norm = host.strip().lower().rstrip(".")
+ if norm.startswith("["): # bracketed IPv6, optionally [::1]:port
+ end = norm.find("]")
+ if end != -1:
+ norm = norm[1:end]
+ elif norm.count(":") == 1: # host:port (a bare IPv6 has >1 colon)
+ norm = norm.split(":", 1)[0]
+ if not norm:
+ return False
+ if norm in ALLOW_LOOPBACK or norm in ALLOW_EXACT:
+ return True
+ if extra_exact and norm in extra_exact:
+ return True
+ if norm.endswith(ALLOW_SUFFIXES):
+ return True
+ return bool(extra_suffixes) and norm.endswith(extra_suffixes)
+
+
+# --- proxy.py plugin -------------------------------------------------------
+
+from proxy.common.utils import text_ # noqa: E402 (kept below the pure
policy)
+from proxy.http.exception import HttpRequestRejected # noqa: E402
+from proxy.http.parser import HttpParser # noqa: E402
+from proxy.http.proxy import HttpProxyBasePlugin # noqa: E402
+
+
+class EgressAllowlistPlugin(HttpProxyBasePlugin):
+ """Reject any upstream host not on the allowlist (default-deny)."""
+
+ def __init__(self, *args: object, **kwargs: object) -> None:
+ super().__init__(*args, **kwargs)
+ self._extra_exact, self._extra_suffixes =
_parse_extra(os.environ.get(_ENV_EXTRA))
+
+ def before_upstream_connection(self, request: HttpParser) -> HttpParser |
None:
+ host = text_(request.host) if request.host else ""
+ if not host_allowed(
+ host,
+ extra_exact=self._extra_exact,
+ extra_suffixes=self._extra_suffixes,
+ ):
+ raise HttpRequestRejected(
+ status_code=403,
+ reason=b"Forbidden",
+ body=b"egress-gateway: host not on allowlist\n",
+ )
+ return request
+
+ def handle_client_request(self, request: HttpParser) -> HttpParser | None:
+ return request
+
+ def handle_upstream_chunk(self, chunk: memoryview) -> memoryview:
+ return chunk
+
+ def on_upstream_connection_close(self) -> None:
+ pass
diff --git a/tools/egress-gateway/src/egress_gateway/cli.py
b/tools/egress-gateway/src/egress_gateway/cli.py
new file mode 100644
index 0000000..216827e
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/cli.py
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""``egress-gateway`` console script — start the allowlisting forward proxy.
+
+Launches proxy.py bound to loopback with the
+:class:`egress_gateway.allowlist.EgressAllowlistPlugin` loaded, so every
+upstream host is checked against the allowlist before a socket is opened.
+
+Operational notes (see README.md for the full story):
+
+* **Binding a listen socket is blocked inside the framework's sandbox.** Run
+ the gateway from a context that is *not* sandboxed (it is the egress-control
+ point, so it legitimately needs unrestricted outbound).
+* **proxy.py keeps runtime state under ``$HOME/.proxy``.** If HOME is not
+ writable in your environment, point it at a writable dir for this process
+ (``HOME=/tmp/egress-home egress-gateway``).
+* Extra allowed hosts: ``EGRESS_ALLOW_EXTRA=host1,.suffix2 egress-gateway``.
+"""
+
+from __future__ import annotations
+
+import argparse
+from collections.abc import Sequence
+
+DEFAULT_HOST = "127.0.0.1"
+DEFAULT_PORT = 8899
+_PLUGIN = "egress_gateway.allowlist.EgressAllowlistPlugin"
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+ parser = argparse.ArgumentParser(
+ prog="egress-gateway",
+ description="Allowlisting HTTP(S) forward proxy (egress-control
chokepoint).",
+ )
+ parser.add_argument("--host", default=DEFAULT_HOST, help="bind address
(default: 127.0.0.1)")
+ parser.add_argument("--port", type=int, default=DEFAULT_PORT, help="bind
port (default: 8899)")
+ parser.add_argument("--log-level", default="INFO", help="proxy.py log
level (default: INFO)")
+ args, passthrough = parser.parse_known_args(argv)
+
+ # Imported lazily so `--help` works even if proxy.py is somehow absent.
+ import proxy
+
+ proxy.main(
+ [
+ "--hostname",
+ args.host,
+ "--port",
+ str(args.port),
+ "--plugins",
+ _PLUGIN,
+ "--log-level",
+ args.log_level,
+ *passthrough,
+ ]
+ )
+ return 0
+
+
+if __name__ == "__main__": # pragma: no cover
+ raise SystemExit(main())
diff --git a/tools/egress-gateway/tests/__init__.py
b/tools/egress-gateway/tests/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tools/egress-gateway/tests/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tools/egress-gateway/tests/test_allowlist.py
b/tools/egress-gateway/tests/test_allowlist.py
new file mode 100644
index 0000000..2568a4b
--- /dev/null
+++ b/tools/egress-gateway/tests/test_allowlist.py
@@ -0,0 +1,98 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Unit tests for the egress allowlist host-matching policy."""
+
+from __future__ import annotations
+
+import pytest
+
+from egress_gateway.allowlist import _parse_extra, host_allowed
+
+
[email protected](
+ "host",
+ [
+ "whimsy.apache.org",
+ "lists.apache.org",
+ "issues.apache.org",
+ "projects.apache.org",
+ "github.com",
+ "api.github.com",
+ "raw.githubusercontent.com",
+ "objects.githubusercontent.com",
+ "sheets.googleapis.com",
+ "oauth2.googleapis.com",
+ "docs.google.com",
+ "pypi.org",
+ "files.pythonhosted.org",
+ "nvd.nist.gov",
+ "cveawg.mitre.org",
+ ],
+)
+def test_allowed_hosts(host: str) -> None:
+ assert host_allowed(host) is True
+
+
[email protected](
+ "host",
+ [
+ "example.com",
+ "api.openai.com",
+ "evil.example.net",
+ "apache.org.evil.com", # suffix spoof — must NOT match ".apache.org"
+ "notgithub.com",
+ "githubusercontent.com.evil.io",
+ "",
+ ],
+)
+def test_denied_hosts(host: str) -> None:
+ assert host_allowed(host) is False
+
+
+def test_loopback_always_allowed() -> None:
+ for host in ("localhost", "127.0.0.1", "::1"):
+ assert host_allowed(host) is True
+
+
+def test_port_and_trailing_dot_normalised() -> None:
+ assert host_allowed("whimsy.apache.org:443") is True
+ assert host_allowed("whimsy.apache.org.") is True
+ assert host_allowed("WHIMSY.Apache.ORG") is True
+
+
+def test_suffix_match_requires_dot_boundary() -> None:
+ # "myapache.org" must not match the ".apache.org" suffix.
+ assert host_allowed("myapache.org") is False
+
+
+def test_extra_exact_and_suffix() -> None:
+ extra_exact, extra_suffixes = _parse_extra("bedrock.example.com,
.internal.corp")
+ assert host_allowed("bedrock.example.com", extra_exact=extra_exact,
extra_suffixes=extra_suffixes) is True
+ assert host_allowed("svc.internal.corp", extra_exact=extra_exact,
extra_suffixes=extra_suffixes) is True
+ # Not granted without the extras.
+ assert host_allowed("bedrock.example.com") is False
+
+
+def test_parse_extra_ignores_blanks() -> None:
+ exact, suffixes = _parse_extra(" , host.example , , .suf.example ,")
+ assert exact == frozenset({"host.example"})
+ assert suffixes == (".suf.example",)
+
+
+def test_parse_extra_empty() -> None:
+ assert _parse_extra(None) == (frozenset(), ())
+ assert _parse_extra("") == (frozenset(), ())
diff --git a/tools/egress-gateway/tool.md b/tools/egress-gateway/tool.md
new file mode 100644
index 0000000..0e2e4b2
--- /dev/null
+++ b/tools/egress-gateway/tool.md
@@ -0,0 +1,105 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents** *generated with
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Tool: egress-gateway](#tool-egress-gateway)
+ - [What this tool provides](#what-this-tool-provides)
+ - [Why this is its own tool](#why-this-is-its-own-tool)
+ - [Relationship to RFC-AI-0003](#relationship-to-rfc-ai-0003)
+ - [How adopters consume this tool](#how-adopters-consume-this-tool)
+ - [What this tool is NOT for](#what-this-tool-is-not-for)
+ - [Failure modes](#failure-modes)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/legal/release-policy.html -->
+
+# Tool: egress-gateway
+
+This directory documents the **egress-gateway** tool — a local
+host-allowlisting HTTP(S) forward proxy that constrains *where* framework
+tools may send data. It is the network-layer egress-control chokepoint that
+backstops the LLM-routing controls in
+[RFC-AI-0003](../../docs/rfcs/RFC-AI-0003.md).
+
+How-to (run it, point tools at it, extend the allowlist) lives in
+[`README.md`](README.md). This file is the **what** and **why**.
+
+## What this tool provides
+
+A `proxy.py`-based forward proxy bound to loopback. The only first-party
+code is a `proxy.py` plugin (`egress_gateway.allowlist.EgressAllowlistPlugin`)
+that enforces a **default-deny host allowlist** in the
+`before_upstream_connection` hook: a CONNECT / request to any host not on the
+allowlist is rejected with `403` before an upstream socket is opened.
+
+The default allowlist mirrors the curated host set the secure sandbox already
+trusts (`sandbox.network.allowedDomains`): ASF infra (`*.apache.org`), GitHub,
+Google APIs, PyPI — suffix-matched. Loopback is always allowed. Adopters
+extend it via the `EGRESS_ALLOW_EXTRA` environment variable without editing
+code.
+
+## Why this is its own tool
+
+Egress control is cross-cutting — it is not specific to one fetch backend or
+one skill, so it does not belong under `tools/gmail/` or inside any single
+skill (which would create N drifting copies). It is also **not** LLM-specific:
+it governs *all* tool egress (mail fetch, roster lookups, issue-tracker
+writes), which is a different concern from the PII redactor and approved-LLM
+gate that RFC-AI-0003's `tools/privacy-llm/` already owns. A dedicated tool
+keeps the egress policy in one auditable place.
+
+It depends on `proxy.py` (a third-party forward proxy), so it cannot live
+inside the stdlib-only `tools/privacy-llm/` sub-tools without polluting their
+dependency-free contract.
+
+## Relationship to RFC-AI-0003
+
+RFC-AI-0003 protects foundation-private data flowing *into LLMs* with two
+mechanisms (PII redactor + approved-LLM gate). Both operate at the
+application layer. They do not, by themselves, stop a skill — or a
+prompt-injection payload riding in an inbound report — from exfiltrating
+private data over an **arbitrary HTTP call** (the gap noted in
+[`docs/setup/secure-agent-setup.md`](../../docs/setup/secure-agent-setup.md):
+`Bash(curl *)` egress bypasses the sandbox proxy).
+
+The egress-gateway closes that gap at the network layer: by funnelling tool
+egress through a default-deny allowlist, private data physically cannot reach
+a non-sanctioned host even if a higher layer is tricked into trying. It is
+**defence-in-depth**, layered under — not a replacement for — the redactor and
+the gate. See RFC-AI-0003 §4.4.
+
+## How adopters consume this tool
+
+1. Run the gateway (outside the sandbox — it needs to bind a port and make
+ unrestricted outbound; that is the point). See [`README.md`](README.md).
+2. Point tool egress at it with `HTTPS_PROXY`/`HTTP_PROXY`, persisted
+ per-machine in `.claude/settings.local.json`'s `env` block.
+3. Allow loopback in `sandbox.network.allowedDomains` so sandboxed tools can
+ reach it (loopback-only; does not widen the internet egress surface).
+
+The gateway's allowlist should be kept in sync with the adopter's
+`sandbox.network.allowedDomains` — they encode the same egress policy at two
+layers.
+
+## What this tool is NOT for
+
+- **Not** an LLM router or a replacement for `tools/privacy-llm/`. It does not
+ redact content and does not gate which LLM may receive data — it gates which
+ *host* any tool may reach.
+- **Not** a payload/content firewall. It tunnels HTTPS via `CONNECT` and
+ allow/denies by host only — no TLS interception, no URL-path or body
+ inspection.
+- **Not** a sandbox replacement. The sandbox still owns filesystem isolation,
+ credential denial, and bind restrictions; the gateway only adds an
+ egress-allowlist chokepoint for outbound HTTP(S).
+
+## Failure modes
+
+| Symptom | Likely cause | Remediation |
+|---|---|---|
+| Gateway exits with `Operation not permitted` on bind | Started inside the
sandbox | Run it from a non-sandboxed context — binding a listener is blocked
under the sandbox |
+| Gateway exits with `PermissionError: '.../.proxy'` | `$HOME` not writable
for the process | `HOME=/tmp/egress-home … egress-gateway` |
+| Sandboxed tool gets `Operation not permitted` reaching `127.0.0.1:PORT` |
Loopback not in `sandbox.network.allowedDomains` | Add `localhost` +
`127.0.0.1` (see `docs/setup/sandbox-troubleshooting.md`) |
+| A legitimate host returns `403 CONNECT rejected` | Host not on the allowlist
| Add it via `EGRESS_ALLOW_EXTRA`, or extend `ALLOW_EXACT`/`ALLOW_SUFFIXES` and
keep it in sync with `sandbox.network.allowedDomains` |
diff --git a/uv.lock b/uv.lock
index bb935c4..8c17b0c 100644
--- a/uv.lock
+++ b/uv.lock
@@ -15,6 +15,7 @@ members = [
"agent-isolation",
"apache-steward",
"checker",
+ "egress-gateway",
"generate-cve-json",
"github-body-field",
"github-rollup",
@@ -341,6 +342,31 @@ wheels = [
{ url =
"https://files.pythonhosted.org/packages/a2/ca/7e8365deec19afb2b2c7be7c1c0aa8f99633b54e90c570999acda93260fc/cryptography-48.0.0-pp311-pypy311_pp73-win_amd64.whl",
hash =
"sha256:db63bf618e5dea46c07de12e900fe1cdd2541e6dc9dbae772a70b7d4d4765f6a", size
= 3739536, upload-time = "2026-05-04T22:59:29.61Z" },
]
+[[package]]
+name = "egress-gateway"
+version = "0.1.0"
+source = { editable = "tools/egress-gateway" }
+dependencies = [
+ { name = "proxy-py" },
+]
+
+[package.dev-dependencies]
+dev = [
+ { name = "mypy" },
+ { name = "pytest" },
+ { name = "ruff" },
+]
+
+[package.metadata]
+requires-dist = [{ name = "proxy-py", specifier = ">=2.4,<3" }]
+
+[package.metadata.requires-dev]
+dev = [
+ { name = "mypy", specifier = ">=1.11" },
+ { name = "pytest", specifier = ">=8.0" },
+ { name = "ruff", specifier = ">=0.6" },
+]
+
[[package]]
name = "generate-cve-json"
version = "0.1.0"
@@ -624,6 +650,15 @@ wheels = [
{ url =
"https://files.pythonhosted.org/packages/ab/36/2ab7647fe1e84bba2baae7f04de241197eed62683fb3085e164de266d111/prek-0.4.1-py3-none-win_arm64.whl",
hash =
"sha256:5b4a348537924b20e208cbd87ef58e96ec37d691c5bec2969209c40de0ecf72e", size
= 5423147, upload-time = "2026-05-20T04:27:17.023Z" },
]
+[[package]]
+name = "proxy-py"
+version = "2.4.10"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url =
"https://files.pythonhosted.org/packages/60/c3/157c302e82abf8e1edf9dae55665b9480c0a6bd63b42cbbeb925a37f1e1f/proxy_py-2.4.10.tar.gz",
hash =
"sha256:41b9e9d3aae6f80e2304d3726e8e9c583a510d8de224eada53d115f48a63a9ce", size
= 326541, upload-time = "2025-02-18T16:36:38.02Z" }
+wheels = [
+ { url =
"https://files.pythonhosted.org/packages/c1/38/e2546d82f769550a54cca9b1ae81f229871c9bb8b9eca55d766c74a83b03/proxy.py-2.4.10-py3-none-any.whl",
hash =
"sha256:ef3a31f6ef3be6ff78559c0e68198523bfe2fb1e820bb16686750c1bb5baf9e8", size
= 227130, upload-time = "2025-02-18T16:36:35.394Z" },
+]
+
[[package]]
name = "pyasn1"
version = "0.6.3"