Claudenw commented on code in PR #677:
URL: https://github.com/apache/creadur-rat/pull/677#discussion_r3441999648


##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+

Review Comment:
   This statement is correct



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted

Review Comment:
   https://github.com/apache/creadur-rat/pull/679 is the PR that does the XXE 
hardening.  I don't know if that impacts here.



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*

Review Comment:
   It depends upon JAXP an can be configured through the JAXP environment 
variables as documented:
   
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaxp/jaxp.html#setting-jaxp-properties-as-system-properties



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+
+## §6 Assumptions about inputs
+
+| Input | Attacker-controllable? (untrusted-run) | Concern |
+| --- | --- | --- |
+| scanned file content | **yes** | parsed/read; resource use |
+| scanned file paths / archive entry names | **yes** | path handling on 
archive extraction |
+| archives (zip/jar/tar) in the tree | **yes** | decompression bomb / 
nested-archive depth |
+| RAT XML configuration | **maybe** (only if config is attacker-supplied) | 
XXE / external entity |
+| invocation arguments | no — trusted caller | — |
+
+## §7 Adversary model
+
+- **In scope:** the party who controls the files/archives/config that an
+  *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited
+  by CI, or the author of a third-party artifact being audited. Capabilities:
+  craft a malicious archive (zip bomb), a hostile XML config (XXE), or
+  pathological file content. *(inferred — Q1.)*
+- **Out of scope:** an attacker who controls the RAT invocation or the trusted
+  source tree (the normal case — they already own the build).
+
+## §8 Security properties the project provides
+
+1. **Bounded resource use on untrusted archives** — **not currently provided.**
+   The archive walker (`ArchiveWalker`) uses Apache Commons Compress
+   `ArchiveStreamFactory` and extracts entry contents into an **in-memory
+   buffer** held until the document is processed, with no decompression /
+   size / depth / entry-count bound — so a crafted archive can exhaust memory
+   (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream
+   responsibility (§10), not a provided property. *(maintainer — confirmed by
+   the Creadur PMC in PR #677 review, 2026-06.)*
+2. **Safe XML configuration parsing** — the config reader should reject
+   DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a
+   crafted config. *Severity:* critical when config is untrusted. The PMC has
+   noted a hardening PR is in flight addressing this (§14 Q3); pending its link
+   this stays tentative. *(maintainer / Q3 pending PR link.)*
+3. **No ambient network/side effects** — RAT does filesystem I/O only.
+   *Violation:* unexpected outbound connection. *(inferred — Q2.)*
+
+(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer;
+item 2 firms up once the §14 Q3 XXE-hardening PR is linked.)
+
+## §9 Security properties the project does *not* provide
+
+- **No safety guarantee when run on fully untrusted input without sandboxing**,
+  if the §14 answers reveal the XML parser/archive walker are not hardened. In
+  that case: treat RAT-on-untrusted-input as you would any parser — sandbox it.
+- **It is not a security/vulnerability scanner** (§3); a clean RAT report says
+  nothing about security.
+- **Decompression-bomb / archive resource exhaustion** — **confirmed not
+  bounded.** Archives are extracted into an in-memory buffer with no
+  size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so
+  RAT pointed at untrusted archives can OOM. Runs over untrusted archives must
+  be sandboxed / resource-limited (§10). *(maintainer.)*
+- **Well-known classes (parser/archive tools):** XXE via configuration,
+  decompression bombs / nested-archive blowup, and path handling on archive
+  entries — the standard risks of any tool that parses XML and descends into
+  archives.
+
+## §10 Downstream responsibilities
+
+- When auditing **untrusted** input (CI on untrusted PRs, third-party
+  artifacts), run RAT with resource limits / in a sandbox, and do not feed it
+  attacker-controlled **configuration**.
+- Keep RAT updated; pin the version in your build.
+- For your own (trusted) source tree — the normal case — no special handling.
+
+## §11 Known misuse patterns
+
+- **Running RAT on untrusted archives/config in CI** without resource limits,
+  expecting it to be hardened against decompression bombs / XXE.
+- **Treating a clean RAT report as a security sign-off** (it is a license 
check).
+
+## §11a Known non-findings (recurring false positives)
+
+- **"RAT reads/parses files it is told to scan"** on a **trusted** tree — that
+  is the function; `OUT-OF-MODEL: trusted-input` (§3/§6).
+- **Odd/invalid license fixtures under `src/test/resources/`** — test data, not
+  a target. `OUT-OF-MODEL: unsupported-component` (§3).
+- **"RAT didn't detect a security vulnerability"** — out of purpose (§3).
+- **XML parsing / archive reading flagged generically** without an untrusted-
+  input path — non-finding unless the reachability precondition (§4) is met.
+
+## §12 Conditions that would change this model
+
+- RAT gaining a network surface or a server mode.
+- A change to the XML parser hardening or archive-walker bounds (§5a/§8).
+- A report unroutable to a §13 disposition → revise §8/§9.
+
+## §13 Triage dispositions
+
+| Disposition | Meaning | Licensed by |
+| --- | --- | --- |
+| `VALID` | A §8 property breaks via untrusted input on a realistic run. | §8, 
§6, §7 |
+| `VALID-HARDENING` | A §11 misuse is too easy (e.g. no archive bound). | §11 |
+| `OUT-OF-MODEL: trusted-input` | Requires RAT to process input the operator 
already trusts. | §6 |
+| `OUT-OF-MODEL: adversary-not-in-scope` | Needs control of the RAT 
invocation/host. | §7 |
+| `OUT-OF-MODEL: unsupported-component` | Test fixtures / out-of-purpose. | §3 
|
+| `BY-DESIGN: property-disclaimed` | "Not a security scanner", trusted-input 
runs. | §9 |
+| `KNOWN-NON-FINDING` | Matches §11a. | §11a |
+| `MODEL-GAP` | Unroutable. | triggers §12 |
+
+## §14 Open questions for the maintainers
+
+**Wave 1 — the load-bearing ones.**
+
+- **Q1.** Confirm the intended trust posture: RAT runs in-process for a
+  trusted caller; inputs are normally trusted, but the security-relevant case 
is
+  RAT auditing **untrusted** input (CI on untrusted PRs, third-party 
artifacts).
+  Is that the case you want modelled, or do you consider all RAT input trusted
+  (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? 
(§2/§7.)
+- **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight
+  ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link
+  to cite**; once landed §8 #2 becomes a provided property.)* Does
+  `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)?
+- **Q4.** *(Answered — PMC, PR #677: no bound. Archives are extracted into an
+  in-memory buffer (Commons Compress `ArchiveStreamFactory`) held until the
+  document is processed, so a crafted archive can OOM. Resolved as a §9 gap +
+  §10 responsibility; §8 #1 is **not** a provided property.)* Does
+  `ArchiveWalker` bound decompression (size/depth/entry-count)?

Review Comment:
   We do not guard against OOM, we probably should add a limit but at this time 
we do not.  This is out of scope and not provided.



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+
+## §6 Assumptions about inputs
+
+| Input | Attacker-controllable? (untrusted-run) | Concern |
+| --- | --- | --- |
+| scanned file content | **yes** | parsed/read; resource use |
+| scanned file paths / archive entry names | **yes** | path handling on 
archive extraction |
+| archives (zip/jar/tar) in the tree | **yes** | decompression bomb / 
nested-archive depth |
+| RAT XML configuration | **maybe** (only if config is attacker-supplied) | 
XXE / external entity |
+| invocation arguments | no — trusted caller | — |
+
+## §7 Adversary model
+
+- **In scope:** the party who controls the files/archives/config that an
+  *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited
+  by CI, or the author of a third-party artifact being audited. Capabilities:
+  craft a malicious archive (zip bomb), a hostile XML config (XXE), or
+  pathological file content. *(inferred — Q1.)*
+- **Out of scope:** an attacker who controls the RAT invocation or the trusted
+  source tree (the normal case — they already own the build).
+
+## §8 Security properties the project provides
+
+1. **Bounded resource use on untrusted archives** — **not currently provided.**
+   The archive walker (`ArchiveWalker`) uses Apache Commons Compress
+   `ArchiveStreamFactory` and extracts entry contents into an **in-memory
+   buffer** held until the document is processed, with no decompression /
+   size / depth / entry-count bound — so a crafted archive can exhaust memory
+   (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream
+   responsibility (§10), not a provided property. *(maintainer — confirmed by
+   the Creadur PMC in PR #677 review, 2026-06.)*
+2. **Safe XML configuration parsing** — the config reader should reject
+   DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a
+   crafted config. *Severity:* critical when config is untrusted. The PMC has
+   noted a hardening PR is in flight addressing this (§14 Q3); pending its link
+   this stays tentative. *(maintainer / Q3 pending PR link.)*
+3. **No ambient network/side effects** — RAT does filesystem I/O only.
+   *Violation:* unexpected outbound connection. *(inferred — Q2.)*
+
+(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer;
+item 2 firms up once the §14 Q3 XXE-hardening PR is linked.)
+
+## §9 Security properties the project does *not* provide
+
+- **No safety guarantee when run on fully untrusted input without sandboxing**,
+  if the §14 answers reveal the XML parser/archive walker are not hardened. In
+  that case: treat RAT-on-untrusted-input as you would any parser — sandbox it.
+- **It is not a security/vulnerability scanner** (§3); a clean RAT report says
+  nothing about security.
+- **Decompression-bomb / archive resource exhaustion** — **confirmed not
+  bounded.** Archives are extracted into an in-memory buffer with no
+  size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so
+  RAT pointed at untrusted archives can OOM. Runs over untrusted archives must
+  be sandboxed / resource-limited (§10). *(maintainer.)*
+- **Well-known classes (parser/archive tools):** XXE via configuration,
+  decompression bombs / nested-archive blowup, and path handling on archive
+  entries — the standard risks of any tool that parses XML and descends into
+  archives.
+
+## §10 Downstream responsibilities
+
+- When auditing **untrusted** input (CI on untrusted PRs, third-party
+  artifacts), run RAT with resource limits / in a sandbox, and do not feed it
+  attacker-controlled **configuration**.
+- Keep RAT updated; pin the version in your build.
+- For your own (trusted) source tree — the normal case — no special handling.
+
+## §11 Known misuse patterns
+
+- **Running RAT on untrusted archives/config in CI** without resource limits,
+  expecting it to be hardened against decompression bombs / XXE.
+- **Treating a clean RAT report as a security sign-off** (it is a license 
check).
+
+## §11a Known non-findings (recurring false positives)
+
+- **"RAT reads/parses files it is told to scan"** on a **trusted** tree — that
+  is the function; `OUT-OF-MODEL: trusted-input` (§3/§6).
+- **Odd/invalid license fixtures under `src/test/resources/`** — test data, not
+  a target. `OUT-OF-MODEL: unsupported-component` (§3).
+- **"RAT didn't detect a security vulnerability"** — out of purpose (§3).
+- **XML parsing / archive reading flagged generically** without an untrusted-
+  input path — non-finding unless the reachability precondition (§4) is met.
+
+## §12 Conditions that would change this model
+
+- RAT gaining a network surface or a server mode.
+- A change to the XML parser hardening or archive-walker bounds (§5a/§8).
+- A report unroutable to a §13 disposition → revise §8/§9.
+
+## §13 Triage dispositions
+
+| Disposition | Meaning | Licensed by |
+| --- | --- | --- |
+| `VALID` | A §8 property breaks via untrusted input on a realistic run. | §8, 
§6, §7 |
+| `VALID-HARDENING` | A §11 misuse is too easy (e.g. no archive bound). | §11 |
+| `OUT-OF-MODEL: trusted-input` | Requires RAT to process input the operator 
already trusts. | §6 |
+| `OUT-OF-MODEL: adversary-not-in-scope` | Needs control of the RAT 
invocation/host. | §7 |
+| `OUT-OF-MODEL: unsupported-component` | Test fixtures / out-of-purpose. | §3 
|
+| `BY-DESIGN: property-disclaimed` | "Not a security scanner", trusted-input 
runs. | §9 |
+| `KNOWN-NON-FINDING` | Matches §11a. | §11a |
+| `MODEL-GAP` | Unroutable. | triggers §12 |
+
+## §14 Open questions for the maintainers
+
+**Wave 1 — the load-bearing ones.**
+
+- **Q1.** Confirm the intended trust posture: RAT runs in-process for a
+  trusted caller; inputs are normally trusted, but the security-relevant case 
is
+  RAT auditing **untrusted** input (CI on untrusted PRs, third-party 
artifacts).
+  Is that the case you want modelled, or do you consider all RAT input trusted
+  (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? 
(§2/§7.)
+- **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight
+  ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link
+  to cite**; once landed §8 #2 becomes a provided property.)* Does
+  `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)?
+- **Q4.** *(Answered — PMC, PR #677: no bound. Archives are extracted into an
+  in-memory buffer (Commons Compress `ArchiveStreamFactory`) held until the
+  document is processed, so a crafted archive can OOM. Resolved as a §9 gap +
+  §10 responsibility; §8 #1 is **not** a provided property.)* Does
+  `ArchiveWalker` bound decompression (size/depth/entry-count)?
+
+**Wave 2 — surface.**
+
+- **Q2.** Confirm RAT makes no network connections and has no side effects 
beyond
+  reading the scanned tree and writing the report. (§5/§8.)

Review Comment:
   Confirmed above.



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*

Review Comment:
   True, no network connections are opened by RAT.  RAT only opens files.  One 
potential hole in this is XSLT transforms where the operator could add an 
xsl:include statement to open a connection to a remote system.  This is out of 
scope as the XSLT are in the trusted space under control of the operator.



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+

Review Comment:
   See above for PR link



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+
+## §6 Assumptions about inputs
+
+| Input | Attacker-controllable? (untrusted-run) | Concern |
+| --- | --- | --- |
+| scanned file content | **yes** | parsed/read; resource use |
+| scanned file paths / archive entry names | **yes** | path handling on 
archive extraction |

Review Comment:
   What does "path handling on archive extraction" mean?  We do not extract the 
data into a directory.  We read the files from the archive and extract them 
from there.  The file paths are documented as relative to the archive so 
something like "/bar/baz.zip#/junk.txt" is reported for a file `junk.txt` in 
the archive baz.zip found in the `/some/dir/bar/` directory on a unix/mac 
system where RAT was pointed to `/some/dir` as the tree to scan.
   
   But the contents of `junk.txt` was only every extracted to memory.



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+
+## §6 Assumptions about inputs
+
+| Input | Attacker-controllable? (untrusted-run) | Concern |
+| --- | --- | --- |
+| scanned file content | **yes** | parsed/read; resource use |
+| scanned file paths / archive entry names | **yes** | path handling on 
archive extraction |
+| archives (zip/jar/tar) in the tree | **yes** | decompression bomb / 
nested-archive depth |
+| RAT XML configuration | **maybe** (only if config is attacker-supplied) | 
XXE / external entity |
+| invocation arguments | no — trusted caller | — |
+
+## §7 Adversary model
+
+- **In scope:** the party who controls the files/archives/config that an
+  *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited
+  by CI, or the author of a third-party artifact being audited. Capabilities:
+  craft a malicious archive (zip bomb), a hostile XML config (XXE), or
+  pathological file content. *(inferred — Q1.)*
+- **Out of scope:** an attacker who controls the RAT invocation or the trusted
+  source tree (the normal case — they already own the build).
+
+## §8 Security properties the project provides
+
+1. **Bounded resource use on untrusted archives** — **not currently provided.**
+   The archive walker (`ArchiveWalker`) uses Apache Commons Compress
+   `ArchiveStreamFactory` and extracts entry contents into an **in-memory
+   buffer** held until the document is processed, with no decompression /
+   size / depth / entry-count bound — so a crafted archive can exhaust memory
+   (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream
+   responsibility (§10), not a provided property. *(maintainer — confirmed by
+   the Creadur PMC in PR #677 review, 2026-06.)*
+2. **Safe XML configuration parsing** — the config reader should reject
+   DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a
+   crafted config. *Severity:* critical when config is untrusted. The PMC has
+   noted a hardening PR is in flight addressing this (§14 Q3); pending its link
+   this stays tentative. *(maintainer / Q3 pending PR link.)*
+3. **No ambient network/side effects** — RAT does filesystem I/O only.
+   *Violation:* unexpected outbound connection. *(inferred — Q2.)*
+
+(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer;
+item 2 firms up once the §14 Q3 XXE-hardening PR is linked.)
+
+## §9 Security properties the project does *not* provide
+
+- **No safety guarantee when run on fully untrusted input without sandboxing**,
+  if the §14 answers reveal the XML parser/archive walker are not hardened. In
+  that case: treat RAT-on-untrusted-input as you would any parser — sandbox it.
+- **It is not a security/vulnerability scanner** (§3); a clean RAT report says
+  nothing about security.
+- **Decompression-bomb / archive resource exhaustion** — **confirmed not
+  bounded.** Archives are extracted into an in-memory buffer with no
+  size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so
+  RAT pointed at untrusted archives can OOM. Runs over untrusted archives must
+  be sandboxed / resource-limited (§10). *(maintainer.)*
+- **Well-known classes (parser/archive tools):** XXE via configuration,
+  decompression bombs / nested-archive blowup, and path handling on archive
+  entries — the standard risks of any tool that parses XML and descends into
+  archives.
+
+## §10 Downstream responsibilities
+
+- When auditing **untrusted** input (CI on untrusted PRs, third-party
+  artifacts), run RAT with resource limits / in a sandbox, and do not feed it
+  attacker-controlled **configuration**.
+- Keep RAT updated; pin the version in your build.
+- For your own (trusted) source tree — the normal case — no special handling.
+
+## §11 Known misuse patterns
+
+- **Running RAT on untrusted archives/config in CI** without resource limits,
+  expecting it to be hardened against decompression bombs / XXE.
+- **Treating a clean RAT report as a security sign-off** (it is a license 
check).
+
+## §11a Known non-findings (recurring false positives)
+
+- **"RAT reads/parses files it is told to scan"** on a **trusted** tree — that
+  is the function; `OUT-OF-MODEL: trusted-input` (§3/§6).
+- **Odd/invalid license fixtures under `src/test/resources/`** — test data, not
+  a target. `OUT-OF-MODEL: unsupported-component` (§3).
+- **"RAT didn't detect a security vulnerability"** — out of purpose (§3).
+- **XML parsing / archive reading flagged generically** without an untrusted-
+  input path — non-finding unless the reachability precondition (§4) is met.
+
+## §12 Conditions that would change this model
+
+- RAT gaining a network surface or a server mode.
+- A change to the XML parser hardening or archive-walker bounds (§5a/§8).
+- A report unroutable to a §13 disposition → revise §8/§9.
+
+## §13 Triage dispositions
+
+| Disposition | Meaning | Licensed by |
+| --- | --- | --- |
+| `VALID` | A §8 property breaks via untrusted input on a realistic run. | §8, 
§6, §7 |
+| `VALID-HARDENING` | A §11 misuse is too easy (e.g. no archive bound). | §11 |
+| `OUT-OF-MODEL: trusted-input` | Requires RAT to process input the operator 
already trusts. | §6 |
+| `OUT-OF-MODEL: adversary-not-in-scope` | Needs control of the RAT 
invocation/host. | §7 |
+| `OUT-OF-MODEL: unsupported-component` | Test fixtures / out-of-purpose. | §3 
|
+| `BY-DESIGN: property-disclaimed` | "Not a security scanner", trusted-input 
runs. | §9 |
+| `KNOWN-NON-FINDING` | Matches §11a. | §11a |
+| `MODEL-GAP` | Unroutable. | triggers §12 |
+
+## §14 Open questions for the maintainers
+
+**Wave 1 — the load-bearing ones.**
+
+- **Q1.** Confirm the intended trust posture: RAT runs in-process for a
+  trusted caller; inputs are normally trusted, but the security-relevant case 
is
+  RAT auditing **untrusted** input (CI on untrusted PRs, third-party 
artifacts).
+  Is that the case you want modelled, or do you consider all RAT input trusted
+  (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? 
(§2/§7.)

Review Comment:
   All rat configuration items (XSLT stylesheets, configuration files, license 
definitions, matcher implementations) are trusted and under control of the 
operator.  
   
   The files that are read may be untrusted, as you point out, in the case of 
verification of PRs from 3rd parties.  
   
   Attack surface is anything that can break out of the scanning stream when 
the system is run with the default settings.  I am certain that there are 
settings that could open the system up for attack, for example the JAXP 
environment vars.s



##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,288 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Creadur (RAT) — Threat Model
+
+## §1 Header
+
+- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)**
+  (`apache/creadur-rat`), with sibling tools **Whisker**
+  (`apache/creadur-whisker`, license-documentation generator) and **Tentacles**
+  (`apache/creadur-tentacles`, release-bundle analyzer). This model is written
+  in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles
+  share RAT's trust profile (§2).
+- **Written against:** `main`/`master` @ HEAD (2026-06).
+- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta
+  rubric) at the Creadur PMC's request (path 3).
+- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified.
+- **Reporting cross-reference:** §8-violating findings via the ASF security
+  process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this 
doc.
+- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each
+  *(inferred)* has a §14 open question.
+- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer
+  answers folded in from PR #677 review, 2026-06).
+
+**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a
+source tree, matches files against configurable license/header definitions, and
+reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**,
+or a **Maven plugin** — always **in the developer's or CI's own process**,
+never as a network service. Whisker generates license documentation; Tentacles
+inspects staged release bundles. None is a server.
+
+## §2 Scope and intended use
+
+Intended use: a project maintainer or CI job runs RAT over a codebase to verify
+license compliance before a release or on each change. The two inputs are the
+**tree being audited** (files, including archives RAT descends into) and the
+**RAT configuration** (XML/text license + matcher definitions).
+
+Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are
+normally trusted too** (your own source, your own config) — but RAT is
+sometimes pointed at **untrusted input**: a CI job auditing an untrusted
+contribution/PR, or auditing a downloaded third-party artifact. That is the
+case the model cares about. *(inferred — Q1.)*
+
+**Component families.**
+
+| Family | Entry point | Untrusted-input exposure | In model? |
+| --- | --- | --- | --- |
+| File walking + license matching | `Reporter`, walkers | scanned file 
**content/paths** | **Yes** |
+| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if 
attacker-supplied) | **Yes** (XXE surface) |
+| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | 
**Yes** (decompression-bomb surface) |
+| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) 
| wrappers — trusted |
+| **License-header insertion (write mode)** | `--addLicense` / editors | 
**modifies files in the audited tree** (operator-invoked) | trusted-input (§3) |
+| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note 
|
+
+**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are
+generated from a common option core, so any security-relevant behaviour (or
+gap) in that core transfers automatically to all three UIs — a finding in one
+front-end's handling generally applies to all of them. *(maintainer.)*
+
+## §3 Out of scope (explicit non-goals)
+
+- **RAT as a security scanner.** RAT checks *license* compliance; it is **not**
+  a vulnerability scanner or a security gate. "RAT didn't catch X security
+  issue" is not in scope. *(documented — purpose.)*
+- **Audit *correctness* as a security property.** A missed/false license match
+  is a correctness bug, not a vulnerability (unless it crosses a resource 
bound,
+  §8). *(inferred.)*
+- **The build/CI environment** RAT runs in, and the trust of the source tree
+  when RAT is deliberately run on your own (trusted) code — the dominant,
+  intended case. Findings whose only impact requires running RAT on input you
+  already trust are `OUT-OF-MODEL: trusted-input`.
+- **Test resources** (the deliberately-odd license fixtures under
+  `*/src/test/resources/`) — those are test data, not a target.
+- **RAT's header-insertion / file-modification mode** (`--addLicense` and the
+  editors) — RAT can *write* license headers into the audited files, mutating
+  the tree. This is explicitly operator-invoked against the operator's own
+  (trusted) sources; a run that modifies files the operator already controls is
+  `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is
+  noted here so the boundary is explicit rather than silent.) *(maintainer.)*
+- **Custom matchers / matcher extensions**
+  (<https://creadur.apache.org/rat/license_def.html#Matchers>) — RAT lets the
+  operator define custom matcher classes in its configuration, and a custom
+  matcher sees the full text of every file selected for scanning. Because the
+  matcher set is operator-defined configuration under the control of whoever
+  runs RAT (not attacker-supplied), a custom matcher reading scanned text is
+  `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied
+  extension code (cf. the write mode above). (Raised by the PMC in review.)
+  *(maintainer — Claudenw.)*
+
+## §4 Trust boundaries and data flow
+
+The boundary is **the input RAT is pointed at** — files and configuration.
+RAT's security questions only arise when that input is **untrusted**:
+
+```
+caller invokes RAT (CLI/Ant/Maven) on a directory + a config
+   │ trusted invocation
+   ▼
+read configuration (XMLConfigurationReader) ── XXE surface if config is 
untrusted
+walk tree -> for each file: read content, match licenses
+   └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path 
surface if archive is untrusted
+   ▼
+report (approved / unapproved / unknown)
+```
+
+**Reachability precondition (triager's test):** a finding is in-model only if 
it
+is triggered by **untrusted input** (a hostile file/archive/config) that a
+*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A
+finding that requires the operator to feed RAT input they already control is
+`OUT-OF-MODEL: trusted-input` (§3).
+
+## §5 Assumptions about the environment
+
+- A JRE; RAT reads the filesystem it is pointed at and writes a report. It 
opens
+  **no network connections** and runs no services. *(inferred — Q2, the
+  no-network claim is high-value to confirm.)*
+- The XML parser behaviour depends on the platform JAXP unless RAT configures 
it
+  (§5a/§8). *(inferred — Q3.)*
+
+## §5a Build-time and configuration variants
+
+RAT has no security-mode flag. The security-relevant configuration is whether
+its **XML config parser disables DOCTYPE/external entities** and whether the
+**archive walker bounds decompression** (depth/size/entry count). Both are
+hardcoded behaviours, not operator knobs. The **archive walker does not bound
+decompression** — it extracts entry contents into an in-memory buffer (Apache
+Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit
+(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened 
via
+a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3
+pending PR link.)*
+
+## §6 Assumptions about inputs
+
+| Input | Attacker-controllable? (untrusted-run) | Concern |
+| --- | --- | --- |
+| scanned file content | **yes** | parsed/read; resource use |
+| scanned file paths / archive entry names | **yes** | path handling on 
archive extraction |
+| archives (zip/jar/tar) in the tree | **yes** | decompression bomb / 
nested-archive depth |
+| RAT XML configuration | **maybe** (only if config is attacker-supplied) | 
XXE / external entity |
+| invocation arguments | no — trusted caller | — |
+
+## §7 Adversary model
+
+- **In scope:** the party who controls the files/archives/config that an
+  *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited
+  by CI, or the author of a third-party artifact being audited. Capabilities:
+  craft a malicious archive (zip bomb), a hostile XML config (XXE), or
+  pathological file content. *(inferred — Q1.)*
+- **Out of scope:** an attacker who controls the RAT invocation or the trusted
+  source tree (the normal case — they already own the build).
+
+## §8 Security properties the project provides
+
+1. **Bounded resource use on untrusted archives** — **not currently provided.**
+   The archive walker (`ArchiveWalker`) uses Apache Commons Compress
+   `ArchiveStreamFactory` and extracts entry contents into an **in-memory
+   buffer** held until the document is processed, with no decompression /
+   size / depth / entry-count bound — so a crafted archive can exhaust memory
+   (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream
+   responsibility (§10), not a provided property. *(maintainer — confirmed by
+   the Creadur PMC in PR #677 review, 2026-06.)*
+2. **Safe XML configuration parsing** — the config reader should reject
+   DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a
+   crafted config. *Severity:* critical when config is untrusted. The PMC has
+   noted a hardening PR is in flight addressing this (§14 Q3); pending its link
+   this stays tentative. *(maintainer / Q3 pending PR link.)*
+3. **No ambient network/side effects** — RAT does filesystem I/O only.
+   *Violation:* unexpected outbound connection. *(inferred — Q2.)*
+
+(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer;
+item 2 firms up once the §14 Q3 XXE-hardening PR is linked.)
+
+## §9 Security properties the project does *not* provide
+
+- **No safety guarantee when run on fully untrusted input without sandboxing**,
+  if the §14 answers reveal the XML parser/archive walker are not hardened. In
+  that case: treat RAT-on-untrusted-input as you would any parser — sandbox it.
+- **It is not a security/vulnerability scanner** (§3); a clean RAT report says
+  nothing about security.
+- **Decompression-bomb / archive resource exhaustion** — **confirmed not
+  bounded.** Archives are extracted into an in-memory buffer with no
+  size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so
+  RAT pointed at untrusted archives can OOM. Runs over untrusted archives must
+  be sandboxed / resource-limited (§10). *(maintainer.)*
+- **Well-known classes (parser/archive tools):** XXE via configuration,
+  decompression bombs / nested-archive blowup, and path handling on archive
+  entries — the standard risks of any tool that parses XML and descends into
+  archives.
+
+## §10 Downstream responsibilities
+
+- When auditing **untrusted** input (CI on untrusted PRs, third-party
+  artifacts), run RAT with resource limits / in a sandbox, and do not feed it
+  attacker-controlled **configuration**.
+- Keep RAT updated; pin the version in your build.
+- For your own (trusted) source tree — the normal case — no special handling.
+
+## §11 Known misuse patterns
+
+- **Running RAT on untrusted archives/config in CI** without resource limits,
+  expecting it to be hardened against decompression bombs / XXE.
+- **Treating a clean RAT report as a security sign-off** (it is a license 
check).
+
+## §11a Known non-findings (recurring false positives)
+
+- **"RAT reads/parses files it is told to scan"** on a **trusted** tree — that
+  is the function; `OUT-OF-MODEL: trusted-input` (§3/§6).
+- **Odd/invalid license fixtures under `src/test/resources/`** — test data, not
+  a target. `OUT-OF-MODEL: unsupported-component` (§3).
+- **"RAT didn't detect a security vulnerability"** — out of purpose (§3).
+- **XML parsing / archive reading flagged generically** without an untrusted-
+  input path — non-finding unless the reachability precondition (§4) is met.
+
+## §12 Conditions that would change this model
+
+- RAT gaining a network surface or a server mode.
+- A change to the XML parser hardening or archive-walker bounds (§5a/§8).
+- A report unroutable to a §13 disposition → revise §8/§9.
+
+## §13 Triage dispositions
+
+| Disposition | Meaning | Licensed by |
+| --- | --- | --- |
+| `VALID` | A §8 property breaks via untrusted input on a realistic run. | §8, 
§6, §7 |
+| `VALID-HARDENING` | A §11 misuse is too easy (e.g. no archive bound). | §11 |
+| `OUT-OF-MODEL: trusted-input` | Requires RAT to process input the operator 
already trusts. | §6 |
+| `OUT-OF-MODEL: adversary-not-in-scope` | Needs control of the RAT 
invocation/host. | §7 |
+| `OUT-OF-MODEL: unsupported-component` | Test fixtures / out-of-purpose. | §3 
|
+| `BY-DESIGN: property-disclaimed` | "Not a security scanner", trusted-input 
runs. | §9 |
+| `KNOWN-NON-FINDING` | Matches §11a. | §11a |
+| `MODEL-GAP` | Unroutable. | triggers §12 |
+
+## §14 Open questions for the maintainers
+
+**Wave 1 — the load-bearing ones.**
+
+- **Q1.** Confirm the intended trust posture: RAT runs in-process for a
+  trusted caller; inputs are normally trusted, but the security-relevant case 
is
+  RAT auditing **untrusted** input (CI on untrusted PRs, third-party 
artifacts).
+  Is that the case you want modelled, or do you consider all RAT input trusted
+  (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? 
(§2/§7.)
+- **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight
+  ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link
+  to cite**; once landed §8 #2 becomes a provided property.)* Does
+  `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)?

Review Comment:
   External entities are disabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to