This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new e7c77bb  feat(egress-gateway): add egress-allowlist proxy; document in 
RFC-AI-0003 (#429)
e7c77bb is described below

commit e7c77bb2cd29720e7779af6929ca17215230b5bd
Author: Jarek Potiuk <[email protected]>
AuthorDate: Mon Jun 1 18:28:28 2026 +0200

    feat(egress-gateway): add egress-allowlist proxy; document in RFC-AI-0003 
(#429)
    
    Adds tools/egress-gateway/ — a local host-allowlisting HTTP(S) forward
    proxy (a proxy.py plugin) that constrains where framework tools may send
    data. It is the network-layer egress-control chokepoint: tools point
    HTTPS_PROXY at it, and any CONNECT/request to a host not on the allowlist
    is rejected with 403 before a socket is opened.
    
    Why: RFC-AI-0003's two mechanisms (PII redactor + approved-LLM gate) act
    at the application layer — they bound what a skill deliberately sends to
    an LLM. Neither stops an unintended outbound flow (a buggy tool, or a
    prompt-injection payload coaxing the agent into curling private data out)
    — the gap docs/setup/secure-agent-setup.md flags for Bash(curl *) egress.
    The gateway closes it at the network layer as defence-in-depth, layered
    under the two mechanisms, never a replacement.
    
    Design:
    - Default-deny allowlist mirrors sandbox.network.allowedDomains (ASF infra,
      GitHub, Google APIs, PyPI), suffix-matched; loopback always allowed;
      adopters extend via EGRESS_ALLOW_EXTRA without editing code.
    - Host-level only (HTTPS via CONNECT, no TLS interception / payload
      inspection) — the right model for egress control without MITM.
    - host_allowed() is a pure function with 28 unit tests (IPv6, port/dot
      normalisation, suffix-spoof rejection, env-extra parsing). proxy.py
      integration is not exercised in CI (needs to bind a port).
    - Separate tool (not a privacy-llm sub-tool) because it carries a
      third-party runtime dep (proxy.py); the privacy-llm sub-tools are
      stdlib-only by contract.
    
    RFC-AI-0003 updated: abstract note + §4.4 (Mechanism 3, defence-in-depth)
    + §6.4 (implementation) + §10.6 (wiring follow-ups) + references. Tool
    registered in docs/labels-and-capabilities.md (capability:setup) and the
    uv workspace.
    
    The mechanism is optional and provisional: it ships as a documented,
    tested tool but is not yet wired into a setup skill or privacy-llm-check;
    §10.6 tracks that follow-up.
    
    Generated-by: Claude Code (Claude Opus 4.8)
---
 docs/labels-and-capabilities.md                    |   1 +
 docs/rfcs/RFC-AI-0003.md                           |  31 +++++
 pyproject.toml                                     |   1 +
 tools/egress-gateway/.gitignore                    |   7 +
 tools/egress-gateway/README.md                     |  97 +++++++++++++
 tools/egress-gateway/pyproject.toml                |  73 ++++++++++
 .../egress-gateway/src/egress_gateway/__init__.py  |  21 +++
 .../egress-gateway/src/egress_gateway/allowlist.py | 154 +++++++++++++++++++++
 tools/egress-gateway/src/egress_gateway/cli.py     |  74 ++++++++++
 tools/egress-gateway/tests/__init__.py             |  16 +++
 tools/egress-gateway/tests/test_allowlist.py       |  98 +++++++++++++
 tools/egress-gateway/tool.md                       | 105 ++++++++++++++
 uv.lock                                            |  35 +++++
 13 files changed, 713 insertions(+)

diff --git a/docs/labels-and-capabilities.md b/docs/labels-and-capabilities.md
index d630203..7b15042 100644
--- a/docs/labels-and-capabilities.md
+++ b/docs/labels-and-capabilities.md
@@ -181,6 +181,7 @@ Tools under [`tools/`](../tools/). Tools with two values 
(separated by
 | [`tools/cve-tool-vulnogram`](../tools/cve-tool-vulnogram/) | 
`capability:resolve` | ASF Vulnogram CVE-allocation adapter. Implements the 
`tools/cve-tool/` contract. Previously named `tools/vulnogram/`. |
 | [`tools/dashboard-generator`](../tools/dashboard-generator/) | 
`capability:stats` | Self-contained HTML dashboard generator |
 | [`tools/dev`](../tools/dev/) | `capability:setup` | Framework dev-loop 
helpers |
+| [`tools/egress-gateway`](../tools/egress-gateway/) | `capability:setup` | 
Egress-allowlist forward proxy (proxy.py plugin); host-level egress chokepoint 
— defence-in-depth for RFC-AI-0003 §4.4 |
 | [`tools/forwarder-relay`](../tools/forwarder-relay/) | `capability:setup` | 
Adapter contract for inbound-relay backends (ASF Security relay, huntr.com, 
HackerOne triagers). Pure interface spec; adapters declare detection + 
credit-extraction + reporter-addressing rules. |
 | [`tools/github`](../tools/github/) | `capability:setup` | GitHub REST / 
GraphQL substrate (called by every lifecycle phase — pure substrate, no single 
phase) |
 | [`tools/github-body-field`](../tools/github-body-field/) | 
`capability:setup` | Read or rewrite one `### Field` section of a GitHub issue 
body without bringing the body into agent context — substrate helper for the 
security-sync skills |
diff --git a/docs/rfcs/RFC-AI-0003.md b/docs/rfcs/RFC-AI-0003.md
index 76edc85..cc20f58 100644
--- a/docs/rfcs/RFC-AI-0003.md
+++ b/docs/rfcs/RFC-AI-0003.md
@@ -13,11 +13,13 @@
     - [4.1 The two mechanisms at a glance](#41-the-two-mechanisms-at-a-glance)
     - [4.2 Mechanism 1 — PII redactor](#42-mechanism-1--pii-redactor)
     - [4.3 Mechanism 2 — approved-LLM gate](#43-mechanism-2--approved-llm-gate)
+    - [4.4 Mechanism 3 (defence-in-depth) — egress-allowlist 
gateway](#44-mechanism-3-defence-in-depth--egress-allowlist-gateway)
   - [5. Data flow](#5-data-flow)
   - [6. Implementation](#6-implementation)
     - [6.1 The redactor sub-tool — 
`tools/privacy-llm/redactor/`](#61-the-redactor-sub-tool--toolsprivacy-llmredactor)
     - [6.2 The checker sub-tool — `tools/privacy-llm/checker/` (PR 
#51)](#62-the-checker-sub-tool--toolsprivacy-llmchecker-pr-51)
     - [6.3 What never reaches any LLM](#63-what-never-reaches-any-llm)
+    - [6.4 The egress gateway — 
`tools/egress-gateway/`](#64-the-egress-gateway--toolsegress-gateway)
   - [7. Adopter configuration](#7-adopter-configuration)
   - [8. Skill wiring summary](#8-skill-wiring-summary)
   - [9. Trust boundaries and status](#9-trust-boundaries-and-status)
@@ -27,6 +29,7 @@
     - [10.3 MCP-layer hooks](#103-mcp-layer-hooks)
     - [10.4 Mapping-file lifecycle tools](#104-mapping-file-lifecycle-tools)
     - [10.5 Doc-cleanup follow-up](#105-doc-cleanup-follow-up)
+    - [10.6 Egress-gateway wiring](#106-egress-gateway-wiring)
   - [11. References](#11-references)
 
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
@@ -74,6 +77,8 @@ The reporter's own identity flows through the agent's context 
as-is, by design 
 
 Both mechanisms are now landed: the redactor (PR #48 + PR #50) and the 
gate-check (PR #51). The full design is shipped, end-to-end, behind explicit 
Step 0 pre-flight calls in every `<security-list>`-touching skill.
 
+**Complementary network-layer control.** The two mechanisms above operate at 
the application layer — they decide what a skill deliberately *sends* to an 
LLM. They do not, by themselves, stop private data from leaving over an 
arbitrary HTTP call (a buggy tool, or a prompt-injection payload that coaxes 
the agent into exfiltration). §4.4 adds an optional **egress-allowlist 
gateway** (`tools/egress-gateway/`) as defence-in-depth: a default-deny host 
allowlist that funnels all tool egress thr [...]
+
 ## 2. Background and motivation
 
 ASF security work routinely handles two kinds of private content:
@@ -230,6 +235,23 @@ The check is deliberately conservative: any single 
unapproved entry stops the sk
 
 **Defence-in-depth:** the gate-check is **also required** for 
`<security-list>`-only skills, even though their body classification permits 
Claude-Code-default LLMs by construction. Running the check at Step 0 ensures 
the adopter's config is in a sane state — no half-configured opt-in entries, no 
LLMs in the active stack the adopter forgot to approve — before any private 
content flows. The `--reads-private-list` flag controls only the printed 
banner; the validation logic is the same either way.
 
+### 4.4 Mechanism 3 (defence-in-depth) — egress-allowlist gateway
+
+The PII redactor and approved-LLM gate both operate at the application layer: 
they constrain what a skill deliberately sends to an LLM. Neither stops an 
*unintended* outbound flow — a buggy skill, a mis-wired tool, or a 
prompt-injection payload hidden in an inbound report that coaxes the agent into 
`curl`-ing private data to an attacker-controlled host. 
[`docs/setup/secure-agent-setup.md`](https://github.com/apache/airflow-steward/blob/main/docs/setup/secure-agent-setup.md)
 flags exactly [...]
+
+The egress-allowlist gateway closes that gap at the network layer. It is a 
local `proxy.py` forward proxy (shipped as 
[`tools/egress-gateway/`](../../tools/egress-gateway/)) that enforces a 
**default-deny host allowlist** in its `before_upstream_connection` hook: any 
CONNECT / request to a host not on the allowlist is rejected with `403` before 
a socket is opened. Tools point `HTTPS_PROXY` / `HTTP_PROXY` at it; Python 
`urllib`-based tools (ponymail, whimsy, jira, …) honour that with no c [...]
+
+| Property | Value |
+|---|---|
+| Layer | Network egress (host-level), below the application-layer LLM 
controls |
+| Policy | Default-deny; allowlist mirrors `sandbox.network.allowedDomains` 
(ASF infra, GitHub, Google APIs, PyPI), suffix-matched; loopback always 
allowed; adopter extends via `EGRESS_ALLOW_EXTRA` |
+| Granularity | Host only — HTTPS is tunnelled via CONNECT, so no URL-path or 
payload inspection (no TLS interception) |
+| Relationship | Defence-in-depth. Layered *under* mechanisms 1 + 2, never a 
replacement: the redactor still strips third-party PII, the gate still bounds 
which LLM may receive a body, and the gateway additionally bounds which host 
*any* tool may reach. |
+
+The gateway runs **outside the sandbox** — it must bind a listener and make 
unrestricted outbound, which is precisely its job as the chokepoint. Sandboxed 
tools reach it over loopback, which requires `localhost` / `127.0.0.1` in 
`sandbox.network.allowedDomains` (loopback-only; this does not widen the 
internet egress surface — that becomes the gateway's responsibility). The 
gateway's allowlist and `sandbox.network.allowedDomains` encode the same egress 
policy at two layers and should be k [...]
+
+This mechanism is **optional and provisional**: it ships as a tool with a 
documented contract and unit-tested allowlist policy, but it is not yet wired 
into a setup skill or the `privacy-llm-check` gate. See §10.6.
+
 ## 5. Data flow
 
 ```text
@@ -343,6 +365,10 @@ The framework treats these surfaces as off-limits to LLM 
context, even when an "
 - The `--field <type>:<value>` arguments themselves. Every value passed there 
is exactly what the redactor is replacing.
 - Any draft text *before* `pii-reveal` runs, when the destination is a 
non-internal surface (e.g. a public PR comment) — the body would still carry 
identifiers, which leak no PII, but skills should not emit identifier-laden 
drafts to non-internal destinations by accident. The destination check in the 
approved-LLM gate is a separate safety net for this.
 
+### 6.4 The egress gateway — `tools/egress-gateway/`
+
+A `proxy.py`-based forward proxy whose only first-party code is the allowlist 
plugin (`egress_gateway.allowlist.EgressAllowlistPlugin`). The host-matching 
policy (`host_allowed`) is a pure function, unit-tested in isolation; the 
proxy.py integration is intentionally not exercised in CI (it needs to bind a 
port). Unlike the stdlib-only `privacy-llm` sub-tools, this one carries a 
third-party runtime dependency (`proxy.py`) — which is why it is a separate 
tool rather than a `privacy-llm` su [...]
+
 ## 7. Adopter configuration
 
 Adopters declare their privacy-LLM posture in a single markdown file at 
`<project-config>/privacy-llm.md` (template at 
[`projects/_template/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/projects/_template/privacy-llm.md)).
 The file has four sections:
@@ -416,6 +442,10 @@ The framework currently does not ship a cleanup tool for 
the mapping file. Manua
 
 A small handful of references in 
[`docs/setup/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/docs/setup/privacy-llm.md)
 still describe `privacy-llm-check` as "PR-3" pending. Now that PR #51 has 
merged, those should be cleaned up to drop the "(PR-3)" phrasing — minor doc 
churn, no contract change. Filed as a follow-up for the next cleanup PR.
 
+### 10.6 Egress-gateway wiring
+
+The egress-allowlist gateway (§4.4, 
[`tools/egress-gateway/`](../../tools/egress-gateway/)) ships as a tool with a 
documented contract but is not yet wired into the setup flow. Possible 
follow-ups: a `setup-isolated-setup-*` step that launches / health-checks the 
gateway and persists `HTTPS_PROXY` into the adopter's per-machine settings; 
sourcing the gateway allowlist directly from `sandbox.network.allowedDomains` 
so the two cannot drift; and a `privacy-llm-check`-style assertion that th [...]
+
 ## 11. References
 
 - **Source-of-truth contracts**
@@ -428,6 +458,7 @@ A small handful of references in 
[`docs/setup/privacy-llm.md`](https://github.c
 - **Reference implementation**
   - 
[`tools/privacy-llm/redactor/`](https://github.com/apache/airflow-steward/tree/main/tools/privacy-llm/redactor)
 — PII redactor (stdlib-only Python)
   - 
[`tools/privacy-llm/checker/`](https://github.com/apache/airflow-steward/tree/main/tools/privacy-llm/checker)
 — approved-LLM gate-check (stdlib-only Python)
+  - [`tools/egress-gateway/`](../../tools/egress-gateway/) — egress-allowlist 
forward proxy (proxy.py plugin; defence-in-depth, §4.4)
 - **Adopter template**
   - 
[`projects/_template/privacy-llm.md`](https://github.com/apache/airflow-steward/blob/main/projects/_template/privacy-llm.md)
 - **Related framework rules**
diff --git a/pyproject.toml b/pyproject.toml
index e4ad7d8..b6c7c88 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -95,6 +95,7 @@ package = false
 [tool.uv.workspace]
 members = [
   "tools/agent-isolation",
+  "tools/egress-gateway",
   "tools/cve-tool-vulnogram/generate-cve-json",
   "tools/cve-tool-vulnogram/oauth-api",
   "tools/github-body-field",
diff --git a/tools/egress-gateway/.gitignore b/tools/egress-gateway/.gitignore
new file mode 100644
index 0000000..8cae406
--- /dev/null
+++ b/tools/egress-gateway/.gitignore
@@ -0,0 +1,7 @@
+__pycache__/
+*.py[cod]
+.venv/
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.coverage
diff --git a/tools/egress-gateway/README.md b/tools/egress-gateway/README.md
new file mode 100644
index 0000000..847d1ca
--- /dev/null
+++ b/tools/egress-gateway/README.md
@@ -0,0 +1,97 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [egress-gateway](#egress-gateway)
+  - [Run it](#run-it)
+  - [Point tools at it](#point-tools-at-it)
+  - [The allowlist](#the-allowlist)
+  - [Test](#test)
+  - [Caveat — host-level, not 
payload-level](#caveat--host-level-not-payload-level)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/legal/release-policy.html -->
+
+# egress-gateway
+
+**Capability:** capability:setup
+
+A local **host-allowlisting HTTP(S) forward proxy** for the
+apache-steward framework. It is the egress-control chokepoint: framework
+tools point `HTTPS_PROXY`/`HTTP_PROXY` at it, and the gateway rejects any
+connection to a host that is not on its allowlist — before a socket is
+opened. This is defence-in-depth for
+[RFC-AI-0003](../../docs/rfcs/RFC-AI-0003.md): even if a skill or a
+prompt-injection tries to send private data to an arbitrary endpoint, the
+destination is blocked.
+
+The contract (what/why) lives in [`tool.md`](tool.md); this file is the
+how-to.
+
+## Run it
+
+```bash
+# From a context that is NOT sandboxed — binding a listen socket and
+# making unrestricted outbound is exactly this process's job.
+uv run --project tools/egress-gateway egress-gateway          # 127.0.0.1:8899
+uv run --project tools/egress-gateway egress-gateway --port 9000
+```
+
+proxy.py keeps runtime state under `$HOME/.proxy`; if HOME is not writable
+in your environment, point it somewhere writable for this process:
+
+```bash
+HOME=/tmp/egress-home uv run --project tools/egress-gateway egress-gateway
+```
+
+## Point tools at it
+
+```bash
+export HTTPS_PROXY=http://127.0.0.1:8899
+export HTTP_PROXY=http://127.0.0.1:8899
+export NO_PROXY=localhost,127.0.0.1
+```
+
+Every framework tool that uses Python `urllib` (ponymail, whimsy, jira, …)
+honours these automatically — no code change. Persist them per-machine in
+`.claude/settings.local.json`'s `env` block (never committed — the gateway
+is a local process).
+
+**Sandbox interaction:** a *sandboxed* process can only reach the loopback
+proxy if `localhost`/`127.0.0.1` are in `sandbox.network.allowedDomains`
+(see 
[`docs/setup/sandbox-troubleshooting.md`](../../docs/setup/sandbox-troubleshooting.md)
+→ *cannot bind to a localhost port*). Adding them is loopback-only and does
+not widen the internet egress surface — that is now the gateway's job.
+
+## The allowlist
+
+Defaults mirror the sandbox's curated `sandbox.network.allowedDomains`
+(ASF infra, GitHub, Google APIs, PyPI), suffix-matched so every
+`*.apache.org` project site is covered. Extend without editing code:
+
+```bash
+EGRESS_ALLOW_EXTRA="bedrock.example.com,.internal.corp" \
+  uv run --project tools/egress-gateway egress-gateway
+```
+
+Entries starting with `.` are treated as suffixes; everything else is an
+exact host. Loopback (`localhost`, `127.0.0.1`, `::1`) is always allowed.
+
+## Test
+
+```bash
+uv run --project tools/egress-gateway --group dev pytest
+```
+
+The allowlist policy (`host_allowed`) is a pure function and is unit-tested
+directly; the proxy.py integration is intentionally not exercised in CI
+(it needs to bind a port).
+
+## Caveat — host-level, not payload-level
+
+The gateway tunnels HTTPS via `CONNECT`; it allow/denies by **host**, not by
+URL path or body. There is no TLS interception, so it cannot inspect request
+payloads. That is the right model for egress *control* without MITM. If you
+need per-path or content-level filtering, that is a different (heavier) tool.
diff --git a/tools/egress-gateway/pyproject.toml 
b/tools/egress-gateway/pyproject.toml
new file mode 100644
index 0000000..3c3bfef
--- /dev/null
+++ b/tools/egress-gateway/pyproject.toml
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "egress-gateway"
+version = "0.1.0"
+description = "Local egress-allowlist proxy for the apache-steward framework — 
a host-allowlisting HTTP(S) forward proxy that constrains where framework tools 
may send data (defence-in-depth for RFC-AI-0003)."
+readme = "README.md"
+requires-python = ">=3.11"
+license = { text = "Apache-2.0" }
+# proxy.py supplies the forward-proxy runtime; the allowlist policy is
+# the only first-party code (a proxy.py plugin). Pinned to the 2.4.x
+# line that ships the HttpProxyBasePlugin.before_upstream_connection
+# hook this plugin relies on.
+dependencies = [
+  "proxy.py>=2.4,<3",
+]
+
+[project.scripts]
+egress-gateway = "egress_gateway.cli:main"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/egress_gateway"]
+
+[tool.ruff]
+line-length = 110
+target-version = "py311"
+src = ["src", "tests"]
+
+[tool.ruff.lint]
+select = [
+  "E",     # pycodestyle errors
+  "W",     # pycodestyle warnings
+  "F",     # pyflakes
+  "I",     # isort
+  "B",     # flake8-bugbear
+  "UP",    # pyupgrade
+  "SIM",   # flake8-simplify
+  "C4",    # flake8-comprehensions
+  "RUF",   # ruff-specific
+]
+ignore = [
+  "E501",  # line-too-long — the 110-char limit above is already generous
+]
+
+[tool.pytest.ini_options]
+minversion = "8.0"
+addopts = "-ra -q"
+testpaths = ["tests"]
+
+[dependency-groups]
+dev = [
+  "pytest>=8.0",
+  "ruff>=0.6",
+  "mypy>=1.11",
+]
diff --git a/tools/egress-gateway/src/egress_gateway/__init__.py 
b/tools/egress-gateway/src/egress_gateway/__init__.py
new file mode 100644
index 0000000..d49af32
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/__init__.py
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Local egress-allowlist proxy for the apache-steward framework."""
+
+from egress_gateway.allowlist import host_allowed
+
+__all__ = ["host_allowed"]
diff --git a/tools/egress-gateway/src/egress_gateway/allowlist.py 
b/tools/egress-gateway/src/egress_gateway/allowlist.py
new file mode 100644
index 0000000..c1dbcc7
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/allowlist.py
@@ -0,0 +1,154 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Egress host-allowlist policy + the proxy.py plugin that enforces it.
+
+The host-matching policy (:func:`host_allowed`) is a pure function with no
+third-party imports, so it is cheap to unit-test in isolation. The
+:class:`EgressAllowlistPlugin` wires that policy into proxy.py's
+``before_upstream_connection`` hook: any CONNECT / request whose target host
+is not on the allowlist is rejected with a 403 before any upstream socket is
+opened. The gateway — not the sandbox — becomes the egress-control point.
+
+The default allowlist mirrors the curated host set the framework's secure
+sandbox already trusts (`sandbox.network.allowedDomains`): ASF infra, GitHub,
+Google APIs, PyPI. Adopters extend it without editing code via the
+``EGRESS_ALLOW_EXTRA`` environment variable (comma-separated hosts; a leading
+dot means "this suffix and all sub-hosts").
+"""
+
+from __future__ import annotations
+
+import os
+
+# Exact hostnames that do not fall under an allowed suffix.
+ALLOW_EXACT: frozenset[str] = frozenset(
+    {
+        "github.com",
+        "api.github.com",
+        "pypi.org",
+        "docs.google.com",
+        "nvd.nist.gov",
+        "cve.org",
+        "www.cve.org",
+        "cveawg.mitre.org",
+        "issues.apache.org",
+    }
+)
+
+# Any host ending in one of these suffixes is allowed.
+ALLOW_SUFFIXES: tuple[str, ...] = (
+    ".apache.org",  # whimsy, lists, projects, issues, every project site
+    ".googleapis.com",  # sheets / gmail / oauth2
+    ".githubusercontent.com",  # raw / objects / codeload
+    ".pythonhosted.org",  # uv / pip wheel downloads
+)
+
+# Loopback is always allowed — local inference endpoints (Ollama/vLLM) and
+# local fixtures never leave the host.
+ALLOW_LOOPBACK: frozenset[str] = frozenset({"localhost", "127.0.0.1", "::1"})
+
+_ENV_EXTRA = "EGRESS_ALLOW_EXTRA"
+
+
+def _parse_extra(raw: str | None) -> tuple[frozenset[str], tuple[str, ...]]:
+    """Split EGRESS_ALLOW_EXTRA into (exact-hosts, suffixes).
+
+    Entries starting with '.' are treated as suffixes; everything else is an
+    exact host. Whitespace and empty entries are ignored.
+    """
+    exact: set[str] = set()
+    suffixes: list[str] = []
+    for entry in (raw or "").split(","):
+        token = entry.strip().lower()
+        if token.startswith("."):
+            host = token.strip(".")
+            if host:
+                suffixes.append("." + host)
+        else:
+            host = token.rstrip(".")
+            if host:
+                exact.add(host)
+    return frozenset(exact), tuple(suffixes)
+
+
+def host_allowed(
+    host: str,
+    *,
+    extra_exact: frozenset[str] | None = None,
+    extra_suffixes: tuple[str, ...] | None = None,
+) -> bool:
+    """Return True if *host* is permitted egress.
+
+    *host* may include a ``:port`` suffix and trailing dot; both are
+    normalised away. Bare and bracketed IPv6 literals (``::1``, ``[::1]:443``)
+    are handled without mangling. Matching is case-insensitive.
+    """
+    norm = host.strip().lower().rstrip(".")
+    if norm.startswith("["):  # bracketed IPv6, optionally [::1]:port
+        end = norm.find("]")
+        if end != -1:
+            norm = norm[1:end]
+    elif norm.count(":") == 1:  # host:port (a bare IPv6 has >1 colon)
+        norm = norm.split(":", 1)[0]
+    if not norm:
+        return False
+    if norm in ALLOW_LOOPBACK or norm in ALLOW_EXACT:
+        return True
+    if extra_exact and norm in extra_exact:
+        return True
+    if norm.endswith(ALLOW_SUFFIXES):
+        return True
+    return bool(extra_suffixes) and norm.endswith(extra_suffixes)
+
+
+# --- proxy.py plugin -------------------------------------------------------
+
+from proxy.common.utils import text_  # noqa: E402  (kept below the pure 
policy)
+from proxy.http.exception import HttpRequestRejected  # noqa: E402
+from proxy.http.parser import HttpParser  # noqa: E402
+from proxy.http.proxy import HttpProxyBasePlugin  # noqa: E402
+
+
+class EgressAllowlistPlugin(HttpProxyBasePlugin):
+    """Reject any upstream host not on the allowlist (default-deny)."""
+
+    def __init__(self, *args: object, **kwargs: object) -> None:
+        super().__init__(*args, **kwargs)
+        self._extra_exact, self._extra_suffixes = 
_parse_extra(os.environ.get(_ENV_EXTRA))
+
+    def before_upstream_connection(self, request: HttpParser) -> HttpParser | 
None:
+        host = text_(request.host) if request.host else ""
+        if not host_allowed(
+            host,
+            extra_exact=self._extra_exact,
+            extra_suffixes=self._extra_suffixes,
+        ):
+            raise HttpRequestRejected(
+                status_code=403,
+                reason=b"Forbidden",
+                body=b"egress-gateway: host not on allowlist\n",
+            )
+        return request
+
+    def handle_client_request(self, request: HttpParser) -> HttpParser | None:
+        return request
+
+    def handle_upstream_chunk(self, chunk: memoryview) -> memoryview:
+        return chunk
+
+    def on_upstream_connection_close(self) -> None:
+        pass
diff --git a/tools/egress-gateway/src/egress_gateway/cli.py 
b/tools/egress-gateway/src/egress_gateway/cli.py
new file mode 100644
index 0000000..216827e
--- /dev/null
+++ b/tools/egress-gateway/src/egress_gateway/cli.py
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""``egress-gateway`` console script — start the allowlisting forward proxy.
+
+Launches proxy.py bound to loopback with the
+:class:`egress_gateway.allowlist.EgressAllowlistPlugin` loaded, so every
+upstream host is checked against the allowlist before a socket is opened.
+
+Operational notes (see README.md for the full story):
+
+* **Binding a listen socket is blocked inside the framework's sandbox.** Run
+  the gateway from a context that is *not* sandboxed (it is the egress-control
+  point, so it legitimately needs unrestricted outbound).
+* **proxy.py keeps runtime state under ``$HOME/.proxy``.** If HOME is not
+  writable in your environment, point it at a writable dir for this process
+  (``HOME=/tmp/egress-home egress-gateway``).
+* Extra allowed hosts: ``EGRESS_ALLOW_EXTRA=host1,.suffix2 egress-gateway``.
+"""
+
+from __future__ import annotations
+
+import argparse
+from collections.abc import Sequence
+
+DEFAULT_HOST = "127.0.0.1"
+DEFAULT_PORT = 8899
+_PLUGIN = "egress_gateway.allowlist.EgressAllowlistPlugin"
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="egress-gateway",
+        description="Allowlisting HTTP(S) forward proxy (egress-control 
chokepoint).",
+    )
+    parser.add_argument("--host", default=DEFAULT_HOST, help="bind address 
(default: 127.0.0.1)")
+    parser.add_argument("--port", type=int, default=DEFAULT_PORT, help="bind 
port (default: 8899)")
+    parser.add_argument("--log-level", default="INFO", help="proxy.py log 
level (default: INFO)")
+    args, passthrough = parser.parse_known_args(argv)
+
+    # Imported lazily so `--help` works even if proxy.py is somehow absent.
+    import proxy
+
+    proxy.main(
+        [
+            "--hostname",
+            args.host,
+            "--port",
+            str(args.port),
+            "--plugins",
+            _PLUGIN,
+            "--log-level",
+            args.log_level,
+            *passthrough,
+        ]
+    )
+    return 0
+
+
+if __name__ == "__main__":  # pragma: no cover
+    raise SystemExit(main())
diff --git a/tools/egress-gateway/tests/__init__.py 
b/tools/egress-gateway/tests/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tools/egress-gateway/tests/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tools/egress-gateway/tests/test_allowlist.py 
b/tools/egress-gateway/tests/test_allowlist.py
new file mode 100644
index 0000000..2568a4b
--- /dev/null
+++ b/tools/egress-gateway/tests/test_allowlist.py
@@ -0,0 +1,98 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Unit tests for the egress allowlist host-matching policy."""
+
+from __future__ import annotations
+
+import pytest
+
+from egress_gateway.allowlist import _parse_extra, host_allowed
+
+
[email protected](
+    "host",
+    [
+        "whimsy.apache.org",
+        "lists.apache.org",
+        "issues.apache.org",
+        "projects.apache.org",
+        "github.com",
+        "api.github.com",
+        "raw.githubusercontent.com",
+        "objects.githubusercontent.com",
+        "sheets.googleapis.com",
+        "oauth2.googleapis.com",
+        "docs.google.com",
+        "pypi.org",
+        "files.pythonhosted.org",
+        "nvd.nist.gov",
+        "cveawg.mitre.org",
+    ],
+)
+def test_allowed_hosts(host: str) -> None:
+    assert host_allowed(host) is True
+
+
[email protected](
+    "host",
+    [
+        "example.com",
+        "api.openai.com",
+        "evil.example.net",
+        "apache.org.evil.com",  # suffix spoof — must NOT match ".apache.org"
+        "notgithub.com",
+        "githubusercontent.com.evil.io",
+        "",
+    ],
+)
+def test_denied_hosts(host: str) -> None:
+    assert host_allowed(host) is False
+
+
+def test_loopback_always_allowed() -> None:
+    for host in ("localhost", "127.0.0.1", "::1"):
+        assert host_allowed(host) is True
+
+
+def test_port_and_trailing_dot_normalised() -> None:
+    assert host_allowed("whimsy.apache.org:443") is True
+    assert host_allowed("whimsy.apache.org.") is True
+    assert host_allowed("WHIMSY.Apache.ORG") is True
+
+
+def test_suffix_match_requires_dot_boundary() -> None:
+    # "myapache.org" must not match the ".apache.org" suffix.
+    assert host_allowed("myapache.org") is False
+
+
+def test_extra_exact_and_suffix() -> None:
+    extra_exact, extra_suffixes = _parse_extra("bedrock.example.com, 
.internal.corp")
+    assert host_allowed("bedrock.example.com", extra_exact=extra_exact, 
extra_suffixes=extra_suffixes) is True
+    assert host_allowed("svc.internal.corp", extra_exact=extra_exact, 
extra_suffixes=extra_suffixes) is True
+    # Not granted without the extras.
+    assert host_allowed("bedrock.example.com") is False
+
+
+def test_parse_extra_ignores_blanks() -> None:
+    exact, suffixes = _parse_extra("  , host.example , , .suf.example ,")
+    assert exact == frozenset({"host.example"})
+    assert suffixes == (".suf.example",)
+
+
+def test_parse_extra_empty() -> None:
+    assert _parse_extra(None) == (frozenset(), ())
+    assert _parse_extra("") == (frozenset(), ())
diff --git a/tools/egress-gateway/tool.md b/tools/egress-gateway/tool.md
new file mode 100644
index 0000000..0e2e4b2
--- /dev/null
+++ b/tools/egress-gateway/tool.md
@@ -0,0 +1,105 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Tool: egress-gateway](#tool-egress-gateway)
+  - [What this tool provides](#what-this-tool-provides)
+  - [Why this is its own tool](#why-this-is-its-own-tool)
+  - [Relationship to RFC-AI-0003](#relationship-to-rfc-ai-0003)
+  - [How adopters consume this tool](#how-adopters-consume-this-tool)
+  - [What this tool is NOT for](#what-this-tool-is-not-for)
+  - [Failure modes](#failure-modes)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+<!-- SPDX-License-Identifier: Apache-2.0
+     https://www.apache.org/legal/release-policy.html -->
+
+# Tool: egress-gateway
+
+This directory documents the **egress-gateway** tool — a local
+host-allowlisting HTTP(S) forward proxy that constrains *where* framework
+tools may send data. It is the network-layer egress-control chokepoint that
+backstops the LLM-routing controls in
+[RFC-AI-0003](../../docs/rfcs/RFC-AI-0003.md).
+
+How-to (run it, point tools at it, extend the allowlist) lives in
+[`README.md`](README.md). This file is the **what** and **why**.
+
+## What this tool provides
+
+A `proxy.py`-based forward proxy bound to loopback. The only first-party
+code is a `proxy.py` plugin (`egress_gateway.allowlist.EgressAllowlistPlugin`)
+that enforces a **default-deny host allowlist** in the
+`before_upstream_connection` hook: a CONNECT / request to any host not on the
+allowlist is rejected with `403` before an upstream socket is opened.
+
+The default allowlist mirrors the curated host set the secure sandbox already
+trusts (`sandbox.network.allowedDomains`): ASF infra (`*.apache.org`), GitHub,
+Google APIs, PyPI — suffix-matched. Loopback is always allowed. Adopters
+extend it via the `EGRESS_ALLOW_EXTRA` environment variable without editing
+code.
+
+## Why this is its own tool
+
+Egress control is cross-cutting — it is not specific to one fetch backend or
+one skill, so it does not belong under `tools/gmail/` or inside any single
+skill (which would create N drifting copies). It is also **not** LLM-specific:
+it governs *all* tool egress (mail fetch, roster lookups, issue-tracker
+writes), which is a different concern from the PII redactor and approved-LLM
+gate that RFC-AI-0003's `tools/privacy-llm/` already owns. A dedicated tool
+keeps the egress policy in one auditable place.
+
+It depends on `proxy.py` (a third-party forward proxy), so it cannot live
+inside the stdlib-only `tools/privacy-llm/` sub-tools without polluting their
+dependency-free contract.
+
+## Relationship to RFC-AI-0003
+
+RFC-AI-0003 protects foundation-private data flowing *into LLMs* with two
+mechanisms (PII redactor + approved-LLM gate). Both operate at the
+application layer. They do not, by themselves, stop a skill — or a
+prompt-injection payload riding in an inbound report — from exfiltrating
+private data over an **arbitrary HTTP call** (the gap noted in
+[`docs/setup/secure-agent-setup.md`](../../docs/setup/secure-agent-setup.md):
+`Bash(curl *)` egress bypasses the sandbox proxy).
+
+The egress-gateway closes that gap at the network layer: by funnelling tool
+egress through a default-deny allowlist, private data physically cannot reach
+a non-sanctioned host even if a higher layer is tricked into trying. It is
+**defence-in-depth**, layered under — not a replacement for — the redactor and
+the gate. See RFC-AI-0003 §4.4.
+
+## How adopters consume this tool
+
+1. Run the gateway (outside the sandbox — it needs to bind a port and make
+   unrestricted outbound; that is the point). See [`README.md`](README.md).
+2. Point tool egress at it with `HTTPS_PROXY`/`HTTP_PROXY`, persisted
+   per-machine in `.claude/settings.local.json`'s `env` block.
+3. Allow loopback in `sandbox.network.allowedDomains` so sandboxed tools can
+   reach it (loopback-only; does not widen the internet egress surface).
+
+The gateway's allowlist should be kept in sync with the adopter's
+`sandbox.network.allowedDomains` — they encode the same egress policy at two
+layers.
+
+## What this tool is NOT for
+
+- **Not** an LLM router or a replacement for `tools/privacy-llm/`. It does not
+  redact content and does not gate which LLM may receive data — it gates which
+  *host* any tool may reach.
+- **Not** a payload/content firewall. It tunnels HTTPS via `CONNECT` and
+  allow/denies by host only — no TLS interception, no URL-path or body
+  inspection.
+- **Not** a sandbox replacement. The sandbox still owns filesystem isolation,
+  credential denial, and bind restrictions; the gateway only adds an
+  egress-allowlist chokepoint for outbound HTTP(S).
+
+## Failure modes
+
+| Symptom | Likely cause | Remediation |
+|---|---|---|
+| Gateway exits with `Operation not permitted` on bind | Started inside the 
sandbox | Run it from a non-sandboxed context — binding a listener is blocked 
under the sandbox |
+| Gateway exits with `PermissionError: '.../.proxy'` | `$HOME` not writable 
for the process | `HOME=/tmp/egress-home … egress-gateway` |
+| Sandboxed tool gets `Operation not permitted` reaching `127.0.0.1:PORT` | 
Loopback not in `sandbox.network.allowedDomains` | Add `localhost` + 
`127.0.0.1` (see `docs/setup/sandbox-troubleshooting.md`) |
+| A legitimate host returns `403 CONNECT rejected` | Host not on the allowlist 
| Add it via `EGRESS_ALLOW_EXTRA`, or extend `ALLOW_EXACT`/`ALLOW_SUFFIXES` and 
keep it in sync with `sandbox.network.allowedDomains` |
diff --git a/uv.lock b/uv.lock
index bb935c4..8c17b0c 100644
--- a/uv.lock
+++ b/uv.lock
@@ -15,6 +15,7 @@ members = [
     "agent-isolation",
     "apache-steward",
     "checker",
+    "egress-gateway",
     "generate-cve-json",
     "github-body-field",
     "github-rollup",
@@ -341,6 +342,31 @@ wheels = [
     { url = 
"https://files.pythonhosted.org/packages/a2/ca/7e8365deec19afb2b2c7be7c1c0aa8f99633b54e90c570999acda93260fc/cryptography-48.0.0-pp311-pypy311_pp73-win_amd64.whl";,
 hash = 
"sha256:db63bf618e5dea46c07de12e900fe1cdd2541e6dc9dbae772a70b7d4d4765f6a", size 
= 3739536, upload-time = "2026-05-04T22:59:29.61Z" },
 ]
 
+[[package]]
+name = "egress-gateway"
+version = "0.1.0"
+source = { editable = "tools/egress-gateway" }
+dependencies = [
+    { name = "proxy-py" },
+]
+
+[package.dev-dependencies]
+dev = [
+    { name = "mypy" },
+    { name = "pytest" },
+    { name = "ruff" },
+]
+
+[package.metadata]
+requires-dist = [{ name = "proxy-py", specifier = ">=2.4,<3" }]
+
+[package.metadata.requires-dev]
+dev = [
+    { name = "mypy", specifier = ">=1.11" },
+    { name = "pytest", specifier = ">=8.0" },
+    { name = "ruff", specifier = ">=0.6" },
+]
+
 [[package]]
 name = "generate-cve-json"
 version = "0.1.0"
@@ -624,6 +650,15 @@ wheels = [
     { url = 
"https://files.pythonhosted.org/packages/ab/36/2ab7647fe1e84bba2baae7f04de241197eed62683fb3085e164de266d111/prek-0.4.1-py3-none-win_arm64.whl";,
 hash = 
"sha256:5b4a348537924b20e208cbd87ef58e96ec37d691c5bec2969209c40de0ecf72e", size 
= 5423147, upload-time = "2026-05-20T04:27:17.023Z" },
 ]
 
+[[package]]
+name = "proxy-py"
+version = "2.4.10"
+source = { registry = "https://pypi.org/simple"; }
+sdist = { url = 
"https://files.pythonhosted.org/packages/60/c3/157c302e82abf8e1edf9dae55665b9480c0a6bd63b42cbbeb925a37f1e1f/proxy_py-2.4.10.tar.gz";,
 hash = 
"sha256:41b9e9d3aae6f80e2304d3726e8e9c583a510d8de224eada53d115f48a63a9ce", size 
= 326541, upload-time = "2025-02-18T16:36:38.02Z" }
+wheels = [
+    { url = 
"https://files.pythonhosted.org/packages/c1/38/e2546d82f769550a54cca9b1ae81f229871c9bb8b9eca55d766c74a83b03/proxy.py-2.4.10-py3-none-any.whl";,
 hash = 
"sha256:ef3a31f6ef3be6ff78559c0e68198523bfe2fb1e820bb16686750c1bb5baf9e8", size 
= 227130, upload-time = "2025-02-18T16:36:35.394Z" },
+]
+
 [[package]]
 name = "pyasn1"
 version = "0.6.3"


Reply via email to