Re: [PR] HADOOP-19925. Create a SECURITY.md file to define the security model [hadoop]

via GitHub Tue, 30 Jun 2026 02:57:05 -0700


ndimiduk commented on code in PR #8562:
URL: https://github.com/apache/hadoop/pull/8562#discussion_r3497752206



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*

Review Comment:
   Might as well give this teeth. Add a note here about repeat offenders being 
banned for a period of time, say, 6 months?



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,

Review Comment:
   I think that this is a tall ask because it's essentially asking the Hadoop 
PMC and security@ to subject the report to a Turing Test on submission. I 
understand the spirit of this bullet but making it REQUIRED without a 
reasonably objective evaluation method strikes me as unrealistic.



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.

Review Comment:
   This is better than the previous language but I'm not sure what the session 
log actually gets us here. The whole LLM experience is a massive RNG, it's not 
reproducible by design. Maybe it could be used to inspect the simulated 
reasoning for faults, but that seems like it's beyond the scope of the PMC's 
transactional obligations for accepting or refuting a finding.
   
   Could be interesting for a post-analysis paper though, if submitters are 
willing to play along.



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*
+
+
+## Reporting a Vulnerability
+
+Report security vulnerabilities in Apache Hadoop privately to
+**[email protected]**.
+
+* Do not cc: any public mailing list.
+* Do **not** open a public JIRA issue, GitHub
+issue, or pull request for an unfixed vulnerability.
+
+For vulnerabilities in CI pipelines, see
+[Reporting Vulnerabilities in CI 
Pipelines](#reporting-vulnerabilities-in-ci-pipelines).
+
+See the Apache Software Foundation's
+[guidelines for reporting security issues](https://www.apache.org/security/) 
for
+the responsible-disclosure process that applies to all ASF projects.
+
+## Third Party Modules
+
+### Reporting a Known CVE in a Hadoop Dependency
+
+Do not report the existence of a published CVE in a Hadoop dependency
+to the security list. These are published and do not need to be treated as
+confidential.
+
+These are considered improvements in the project, and are managed in
+the project's [issue tracker](https://issues.apache.org/jira/issues/).
+1. Search for any existing issue covering the dependency upgrade.
+2. If it exists, read it, its discussion, the PRs etc, and see what versions
+   it has been merged to.
+3. If it hasn't been merged, look at why and get involved: major work is 
likely to be
+   needed.
+4. If there isn't an issue, create one and start work on the PR!
+
+Tip: an easy way to check for the version of a library to ship in the trunk
+release of hadoop is the [LICENSE-binary](./LICENSE-binary) file.
+
+Please do not send an email listing the CVEs an automated scan
+tool reported and requesting updates, timelines etc.
+Open source development is a community process, and addressing this is done
+in the [developer mailing lists](https://hadoop.apache.org/mailing_lists.html).
+Join the community to help get your needs addressed.
+
+If you cannot find existing information on whether the project is affected by 
the issue in the advisory,
+it may be up to you, as a part of the project community, to participate in its 
handling.
+Ensure you provide detailed information when starting a discussion - review 
how the project uses the dependency and have your opinion on the priority to 
upgrade,
+or even remove, the dependency.
+Contributions upgrading the dependency to a version that is not affected by 
the problem are generally welcomed, though will not typically expedite the 
release schedule.
+
+Actively participating in the release process, especially qualifying 
pre-release artifacts in your
+own deployments, is the most effective way of accelerating the release 
timetable.
+
+### Providing Advance Warning of a Critical CVE in a Hadoop Dependency
+
+If a team providing a library which Hadoop bundles has a critical CVE which
+a forthcoming fix will correct, they are encouraged to notify the hadoop 
security
+list so we can identify whether the project is exposed, help review and 
validate fixes and
+co-ordinate releases.
+
+We SHALL treat all such reports as confidential.
+
+### Reporting a Newly-Discovered Vulnerability in a Third-Party Module
+
+Security bugs in third-party modules (the JVM, the Kerberos infrastructure, 
cloud
+SDKs, connectors, or any other dependency) should be reported to their 
respective
+maintainers, through their own security-reporting mechanisms — after verifying
+the issue is in scope of *their* threat model and reproduces against *their*
+current release.
+
+## Supported Versions
+
+Security fixes are made only to the most recent Apache Hadoop release line(s).
+Older release lines are end-of-life and do not receive security updates; the
+remedy for a vulnerability in an old line is to upgrade. Refer to the
+[Apache Hadoop release and download 
policy](https://hadoop.apache.org/releases.html)
+for which lines are currently maintained. A report MUST be reproducible 
against a
+maintained release or the current `trunk` branch.
+
+## The Hadoop Threat Model
+
+In the Hadoop threat model there are **trusted elements**. Vulnerabilities that
+require the compromise of these trusted elements are outside the scope of the
+model:
+
+- **Cluster Administrators are trusted.**
+- **DNS is trusted.**
+- **The Kerberos authentication infrastructure is trusted.** Active Directory,
+  FreeIPA, or whichever other Key Distribution Center (KDC) is in use is 
trusted
+  and required to be well-configured — including synchronized clocks 
(NTP/chrony)
+  across the KDC, services, and clients, within the Kerberos clock-skew window.
+  Authentication failures caused by clock drift are operational bugs, not
+  vulnerabilities.
+- **The network perimeter is trusted to keep the public internet out, but the
+  wire is not assumed confidential.** The perimeter does not authenticate 
callers —
+  Kerberos authentication does that at the service level; the perimeter's job 
is to
+  keep the cluster off the public internet (Hadoop clusters are never 
web-facing).
+  Within that, Hadoop may run with optional wire encryption (RPC `privacy` QOP;
+  HDFS block-transfer encryption). Running without encryption is by design and 
not
+  a vulnerability; but when encryption is enabled, a failure to actually 
protect
+  traffic — no-op encryption, silent downgrade, or MITM bypass — is in scope.
+- **Any hosting cloud or infrastructure provider is trusted, as is the
+  underlying hardware.** This includes the CPU, memory, storage, and network
+  hardware, even on shared/multi-tenant cloud systems where that hardware is
+  physically shared with other tenants. Attacks that require malicious or
+  compromised hardware, hypervisor escape, or cross-tenant side channels
+  (speculative-execution, Rowhammer, and similar) are the responsibility of the
+  hardware and infrastructure provider, and are out of scope.
+- **The underlying operating system is trusted.** Hadoop relies on OS process
+  isolation, file permissions, and (where required) OS-level disk encryption.
+  An attack that first requires the OS to be compromised or misconfigured is 
out
+  of scope.
+- **Valid site configuration is trusted.** We expect `core-site.xml`,

Review Comment:
   Is it worth a comment about the contents of `*-site.xml` being considered 
proprietary for the installation and it is expected that it can contain secrets 
in cleartext?



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*
+
+
+## Reporting a Vulnerability
+
+Report security vulnerabilities in Apache Hadoop privately to
+**[email protected]**.
+
+* Do not cc: any public mailing list.
+* Do **not** open a public JIRA issue, GitHub
+issue, or pull request for an unfixed vulnerability.
+
+For vulnerabilities in CI pipelines, see
+[Reporting Vulnerabilities in CI 
Pipelines](#reporting-vulnerabilities-in-ci-pipelines).
+
+See the Apache Software Foundation's
+[guidelines for reporting security issues](https://www.apache.org/security/) 
for
+the responsible-disclosure process that applies to all ASF projects.
+
+## Third Party Modules
+
+### Reporting a Known CVE in a Hadoop Dependency
+
+Do not report the existence of a published CVE in a Hadoop dependency
+to the security list. These are published and do not need to be treated as
+confidential.
+
+These are considered improvements in the project, and are managed in
+the project's [issue tracker](https://issues.apache.org/jira/issues/).
+1. Search for any existing issue covering the dependency upgrade.
+2. If it exists, read it, its discussion, the PRs etc, and see what versions
+   it has been merged to.
+3. If it hasn't been merged, look at why and get involved: major work is 
likely to be
+   needed.
+4. If there isn't an issue, create one and start work on the PR!
+
+Tip: an easy way to check for the version of a library to ship in the trunk
+release of hadoop is the [LICENSE-binary](./LICENSE-binary) file.
+
+Please do not send an email listing the CVEs an automated scan
+tool reported and requesting updates, timelines etc.
+Open source development is a community process, and addressing this is done
+in the [developer mailing lists](https://hadoop.apache.org/mailing_lists.html).
+Join the community to help get your needs addressed.
+
+If you cannot find existing information on whether the project is affected by 
the issue in the advisory,
+it may be up to you, as a part of the project community, to participate in its 
handling.
+Ensure you provide detailed information when starting a discussion - review 
how the project uses the dependency and have your opinion on the priority to 
upgrade,
+or even remove, the dependency.
+Contributions upgrading the dependency to a version that is not affected by 
the problem are generally welcomed, though will not typically expedite the 
release schedule.
+
+Actively participating in the release process, especially qualifying 
pre-release artifacts in your
+own deployments, is the most effective way of accelerating the release 
timetable.
+
+### Providing Advance Warning of a Critical CVE in a Hadoop Dependency
+
+If a team providing a library which Hadoop bundles has a critical CVE which
+a forthcoming fix will correct, they are encouraged to notify the hadoop 
security
+list so we can identify whether the project is exposed, help review and 
validate fixes and
+co-ordinate releases.
+
+We SHALL treat all such reports as confidential.
+
+### Reporting a Newly-Discovered Vulnerability in a Third-Party Module
+
+Security bugs in third-party modules (the JVM, the Kerberos infrastructure, 
cloud
+SDKs, connectors, or any other dependency) should be reported to their 
respective
+maintainers, through their own security-reporting mechanisms — after verifying
+the issue is in scope of *their* threat model and reproduces against *their*
+current release.
+
+## Supported Versions
+
+Security fixes are made only to the most recent Apache Hadoop release line(s).
+Older release lines are end-of-life and do not receive security updates; the
+remedy for a vulnerability in an old line is to upgrade. Refer to the
+[Apache Hadoop release and download 
policy](https://hadoop.apache.org/releases.html)
+for which lines are currently maintained. A report MUST be reproducible 
against a
+maintained release or the current `trunk` branch.
+
+## The Hadoop Threat Model
+
+In the Hadoop threat model there are **trusted elements**. Vulnerabilities that
+require the compromise of these trusted elements are outside the scope of the
+model:
+
+- **Cluster Administrators are trusted.**
+- **DNS is trusted.**
+- **The Kerberos authentication infrastructure is trusted.** Active Directory,
+  FreeIPA, or whichever other Key Distribution Center (KDC) is in use is 
trusted
+  and required to be well-configured — including synchronized clocks 
(NTP/chrony)
+  across the KDC, services, and clients, within the Kerberos clock-skew window.
+  Authentication failures caused by clock drift are operational bugs, not
+  vulnerabilities.
+- **The network perimeter is trusted to keep the public internet out, but the
+  wire is not assumed confidential.** The perimeter does not authenticate 
callers —
+  Kerberos authentication does that at the service level; the perimeter's job 
is to
+  keep the cluster off the public internet (Hadoop clusters are never 
web-facing).
+  Within that, Hadoop may run with optional wire encryption (RPC `privacy` QOP;
+  HDFS block-transfer encryption). Running without encryption is by design and 
not
+  a vulnerability; but when encryption is enabled, a failure to actually 
protect
+  traffic — no-op encryption, silent downgrade, or MITM bypass — is in scope.
+- **Any hosting cloud or infrastructure provider is trusted, as is the
+  underlying hardware.** This includes the CPU, memory, storage, and network
+  hardware, even on shared/multi-tenant cloud systems where that hardware is
+  physically shared with other tenants. Attacks that require malicious or
+  compromised hardware, hypervisor escape, or cross-tenant side channels
+  (speculative-execution, Rowhammer, and similar) are the responsibility of the
+  hardware and infrastructure provider, and are out of scope.
+- **The underlying operating system is trusted.** Hadoop relies on OS process
+  isolation, file permissions, and (where required) OS-level disk encryption.
+  An attack that first requires the OS to be compromised or misconfigured is 
out
+  of scope.
+- **Valid site configuration is trusted.** We expect `core-site.xml`,
+  `hdfs-site.xml`, `yarn-site.xml` and the rest of the site configuration to be
+  valid and to be writable only by trusted administrators. If an attacker can
+  manipulate the site configuration, the game is already over — that is out of
+  scope.
+- **The classpath is trusted.** We expect no malicious JAR files to be on the 
classpath.
+  If an attacker can add a malicious JAR to the classpath, then it is the 
ability to add the
+  JAR to a process launched by a higher-privilege which is the exploit.
+  Launching a process as the current principal with a malicious JAR is not an 
exploit,
+  nor is any attack which makes the ability to manipulate the classpath a 
pre-requisite.
+
+Within that model, the boundary Hadoop **defends** is **privilege escalation
+across an authenticated boundary within a Kerberos-secured cluster**.
+Examples of in-scope issues are:
+
+- A user acting as another user, as a service, or as a superuser without the
+  authorization to do so.
+- Bypassing service-level authorization / ACLs
+  (see [Service Level 
Authorization](hadoop-common-project/hadoop-common/src/site/markdown/ServiceLevelAuth.md)).
+- Forging, leaking, or improperly reusing delegation tokens.
+- Defeating the constraints on proxy/superuser impersonation
+  (see [Proxy user - Superusers Acting On Behalf Of Other 
Users](hadoop-common-project/hadoop-common/src/site/markdown/Superusers.md)).
+- Failure of clients to detect and reject malicious service endpoints/MITM 
attacks.
+
+Further properties of the model:
+
+- **Hadoop clusters are never web-facing.** They are deployed behind a network
+  perimeter; network rules are expected to keep the cluster off the public
+  internet. The perimeter does not authenticate callers — Kerberos does that at
+  the service level. A report which assumes a cluster is directly exposed to 
the
+  public internet is not in scope.
+- **Wire encryption is optional and controlled by site configuration.** Network
+  traffic between Hadoop components may or may not be encrypted, depending on 
the
+  deployment's configuration. The absence of wire encryption when it has not 
been
+  enabled is not a vulnerability.
+
+Relevant operational security documentation:
+
+- [Hadoop in Secure 
Mode](hadoop-common-project/hadoop-common/src/site/markdown/SecureMode.md)
+- [Service Level 
Authorization](hadoop-common-project/hadoop-common/src/site/markdown/ServiceLevelAuth.md)
+- [Authentication for Hadoop HTTP 
web-consoles](hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md)
+- [Proxy user - 
Superusers](hadoop-common-project/hadoop-common/src/site/markdown/Superusers.md)
+- [Credential Provider 
API](hadoop-common-project/hadoop-common/src/site/markdown/CredentialProviderAPI.md)
+- [YARN Application 
Security](hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md)
+- [Transparent Encryption in 
HDFS](hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md)
+
+## Deployment Threat Model
+
+Hadoop is deployed in a number of ways, with different security boundaries.
+
+### Standalone (Insecure) Mode
+
+In its standalone configuration Hadoop performs no real authentication. Anyone 
with
+network access to the cluster has full access to its data and can submit work.
+
+This mode is *intended* to run only on a trusted, network-isolated host or
+network. It is insecure **by design**. "The unsecured cluster has no security" 
is
+not a vulnerability, and arbitrary data access against a non-Kerberos cluster 
is
+inherent in security being disabled.
+
+It should only be used for standalone development/test environments, with 
firewalls preventing remote access.
+It can then be used to test Hadoop and applications running on top of it.
+
+### Secure (Kerberos) Clusters
+
+This is the deployment the security model defends, as described in
+[The Hadoop threat model](#the-hadoop-threat-model) above: Kerberos
+authentication, service-level authorization, delegation tokens, and constrained
+proxy/superuser impersonation.
+
+It is the expected deployment of production physical clusters.
+1. A trusted Kerberos system is used to authenticate principals.
+2. Services have been issued with credentials (keytabs) in files, secured on 
the physical hosts via OS file permissions.
+3. Users of the cluster all authenticate with the Kerberos system for their 
access.
+4. Access to the cluster may be via a proxy mechanism.
+5. The HDFS filesystem uses Kerberos to authenticate HDFS nodes and services 
themselves, other cluster services (YARN, Apache ZooKeeper etc) and callers.
+6. HDFS block tokens are issued by the HDFS Name Node to grant data access to 
authenticated principals;
+   the possessor of a token may access a block of data on a data node with the 
permissions in that token,
+   without the need to supply any further authentication information.
+
+Hadoop services issue _delegation tokens_: an authenticated principal obtains 
a token directly from a service such as HDFS, Apache HBase, Apache Hive, Apache 
Knox and more.
+YARN distributes these tokens to an application's containers and renews them 
on the application's behalf, so tasks can authenticate to those services 
without holding Kerberos credentials themselves.
+These tokens have an independent life from the Kerberos credentials
+* They have a limited lifespan of a number of hours.
+* They can be cancelled: the issuing service MUST then reject requests using 
them as authentication.
+* They can be renewed: before their lifespan expires the renewer requests the 
issuing service to extend their lifespan.
+
+The details of these tokens or how issuing, cancellation and renewal are 
managed are not covered in this document.
+Hadoop and applications MUST safely marshall and store these tokens; if they 
are published in any form then permissions are being leaked.
+
+### Transient Cloud Deployments
+
+Hadoop is frequently deployed as a transient cluster in a cloud environment:
+
+- Cloud credentials are supplied to the deployment by the hosting 
infrastructure
+  — for example AWS IAM roles attached to the VMs/containers, or equivalent
+  mechanisms on other clouds. **These supplied credentials, and the access they
+  grant, are the trust boundary.** Using credentials provided to the VM or
+  container the code runs in is not a vulnerability.
+- The cluster is **transient** and typically single-tenant: it is created for a
+  workload and destroyed afterwards.
+- **Network rules prevent access by untrusted principals.** As with on-premises
+  clusters, the deployment is not web-facing; the network perimeter is part of
+  the model.
+
+Hadoop clusters MUST NOT be deployed in cloud without network rules to isolate 
them from the public internet.
+
+### Client-only Deployments
+
+The hadoop client libraries can be used as part of an application 
communicating with remote
+kerberos-authenticated services or to cloud infrastructure:
+
+- No service endpoints are created.
+- HTTP, HTTPS and IPC connections are set up to communicate with remote 
services.
+- Credentials are stored on the client, either for direct authentication,
+  or via authentication services such as Kerberos or OAuth 2, services which 
provide shorter-lived credentials.
+- The shorter-lived credentials are used for communication with the remote 
services themselves.
+
+In client-side use, the following is trusted
+- The underlying operating system and its configuration.
+- The principal and the host administrator are trusted.
+- The application classpath.
+- The client-side configuration and CLI arguments.
+
+Whether or not the network is trusted to the extent that DNS is trusted and 
network encryption is mandatory for HDFS, cluster service and cloud service 
communication
+along with TLS where appropriate, is a matter for client configuration and out 
of scope of this security model.
+
+## Data at Rest and Temporary Files
+
+- **Persisting data in the cluster filesystem encrypted requires HDFS 
encryption** (see
+  [Transparent Encryption in 
HDFS](hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md)).
+  Where encryption has been configured, a failure of the code to actually
+  encrypt the persisted data **is a vulnerability** and should be reported.
+- **Temporary data** is written to local-filesystem temporary directories. The
+  requirement is that the operating system secures and, where required, 
encrypts
+  these directories — this is part of the trusted-OS assumption. Within that:
+  - Code that creates temporary files and directories **MUST create them and 
set
+    their permissions atomically.** Creating a file or directory with 
permissive
+    defaults and then narrowing the permissions in a later step leaves a window
+    in which another local principal can act on it.
+  - A failure to create files/directories and set their permissions atomically
+    **is an issue** and should be reported.
+
+## Logging
+
+Logs are expected to be kept private.
+
+That is, users and administrators are _expected_ not to share them with 
untrusted entities.
+This is consistent with the [Log4j Security 
Model](https://logging.apache.org/security.html),
+which is the logging framework Hadoop uses.
+
+Services MUST restrict access to logs.
+- Non-administrator principals MUST NOT be able to access logs of applications 
launched by other principals.
+- Non-administrator principals MUST NOT be able to access logs of services.
+
+Preventing log overflow attacks is a matter of configuring logging, and out of 
scope.
+Default/example log configurations SHOULD define a rollover policy which 
limits the size of log files created.
+
+Limiting log size may permit failed attempts to authenticate with a service to 
be logged.
+Log4J does support log aggregation across systems; services can be configured 
to feed selective logs
+to central services.
+
+Services SHOULD assist central security logging by providing specific logs for 
reporting authentication
+failures/privilege rejections.
+
+### Secrets and Logging
+
+Leaking secrets into logs is [CWE-532: Insertion of Sensitive Information into
+Log File](https://cwe.mitre.org/data/definitions/532.html).
+The following rules apply to Hadoop code:
+
+- Secrets *SHOULD NOT* be logged.
+- **Persistent secrets, long-lived credentials, and encryption secrets (keys,
+  key material, passwords) *MUST NOT* be logged at any level.**
+- **Transient secrets** (for example short-lived tokens) *MUST NOT* be logged 
at
+  `INFO`, `WARN`, or `ERROR` level, and *SHOULD NOT* be logged at `DEBUG` or
+  `TRACE` level.
+
+Transient secrets are called out specifically because secrets sometimes 
surface in HTTP/web
+request logs (URLs, headers, query parameters) and are visible
+when third-party components including JDK classes are configured to log at 
TRACE.
+Preventing logging of these is best-effort.
+
+## Development and CI Threat Model
+
+The project is built on developer systems and in CI systems, and **we do care
+about attacks on these.**
+Development and CI are explicitly in scope.
+
+See [Important Security Information for GitHub 
Actions](.github/workflow-security.md)
+for the detailed CI/workflow security guidance. In summary:
+
+- All inputs from external pull requests — titles, comments, author metadata, 
and
+  code — *SHALL* be considered untrusted, and *MUST NOT* be fed directly or
+  indirectly to shell commands without sanitization.
+- Upstream dependencies from non-ASF projects *MAY* be subverted by 
supply-chain

Review Comment:
   Is this cooldown policy further specified elsewhere?



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*
+
+
+## Reporting a Vulnerability
+
+Report security vulnerabilities in Apache Hadoop privately to
+**[email protected]**.
+
+* Do not cc: any public mailing list.
+* Do **not** open a public JIRA issue, GitHub
+issue, or pull request for an unfixed vulnerability.
+
+For vulnerabilities in CI pipelines, see
+[Reporting Vulnerabilities in CI 
Pipelines](#reporting-vulnerabilities-in-ci-pipelines).
+
+See the Apache Software Foundation's
+[guidelines for reporting security issues](https://www.apache.org/security/) 
for
+the responsible-disclosure process that applies to all ASF projects.
+
+## Third Party Modules
+
+### Reporting a Known CVE in a Hadoop Dependency
+
+Do not report the existence of a published CVE in a Hadoop dependency
+to the security list. These are published and do not need to be treated as
+confidential.
+
+These are considered improvements in the project, and are managed in
+the project's [issue tracker](https://issues.apache.org/jira/issues/).
+1. Search for any existing issue covering the dependency upgrade.
+2. If it exists, read it, its discussion, the PRs etc, and see what versions
+   it has been merged to.
+3. If it hasn't been merged, look at why and get involved: major work is 
likely to be
+   needed.
+4. If there isn't an issue, create one and start work on the PR!
+
+Tip: an easy way to check for the version of a library to ship in the trunk
+release of hadoop is the [LICENSE-binary](./LICENSE-binary) file.
+
+Please do not send an email listing the CVEs an automated scan
+tool reported and requesting updates, timelines etc.
+Open source development is a community process, and addressing this is done
+in the [developer mailing lists](https://hadoop.apache.org/mailing_lists.html).
+Join the community to help get your needs addressed.
+
+If you cannot find existing information on whether the project is affected by 
the issue in the advisory,
+it may be up to you, as a part of the project community, to participate in its 
handling.
+Ensure you provide detailed information when starting a discussion - review 
how the project uses the dependency and have your opinion on the priority to 
upgrade,
+or even remove, the dependency.
+Contributions upgrading the dependency to a version that is not affected by 
the problem are generally welcomed, though will not typically expedite the 
release schedule.
+
+Actively participating in the release process, especially qualifying 
pre-release artifacts in your
+own deployments, is the most effective way of accelerating the release 
timetable.
+
+### Providing Advance Warning of a Critical CVE in a Hadoop Dependency
+
+If a team providing a library which Hadoop bundles has a critical CVE which
+a forthcoming fix will correct, they are encouraged to notify the hadoop 
security
+list so we can identify whether the project is exposed, help review and 
validate fixes and
+co-ordinate releases.
+
+We SHALL treat all such reports as confidential.
+
+### Reporting a Newly-Discovered Vulnerability in a Third-Party Module
+
+Security bugs in third-party modules (the JVM, the Kerberos infrastructure, 
cloud
+SDKs, connectors, or any other dependency) should be reported to their 
respective
+maintainers, through their own security-reporting mechanisms — after verifying
+the issue is in scope of *their* threat model and reproduces against *their*
+current release.
+
+## Supported Versions
+
+Security fixes are made only to the most recent Apache Hadoop release line(s).
+Older release lines are end-of-life and do not receive security updates; the
+remedy for a vulnerability in an old line is to upgrade. Refer to the
+[Apache Hadoop release and download 
policy](https://hadoop.apache.org/releases.html)
+for which lines are currently maintained. A report MUST be reproducible 
against a
+maintained release or the current `trunk` branch.
+
+## The Hadoop Threat Model
+
+In the Hadoop threat model there are **trusted elements**. Vulnerabilities that
+require the compromise of these trusted elements are outside the scope of the
+model:
+
+- **Cluster Administrators are trusted.**
+- **DNS is trusted.**
+- **The Kerberos authentication infrastructure is trusted.** Active Directory,
+  FreeIPA, or whichever other Key Distribution Center (KDC) is in use is 
trusted
+  and required to be well-configured — including synchronized clocks 
(NTP/chrony)
+  across the KDC, services, and clients, within the Kerberos clock-skew window.
+  Authentication failures caused by clock drift are operational bugs, not
+  vulnerabilities.
+- **The network perimeter is trusted to keep the public internet out, but the
+  wire is not assumed confidential.** The perimeter does not authenticate 
callers —
+  Kerberos authentication does that at the service level; the perimeter's job 
is to
+  keep the cluster off the public internet (Hadoop clusters are never 
web-facing).
+  Within that, Hadoop may run with optional wire encryption (RPC `privacy` QOP;
+  HDFS block-transfer encryption). Running without encryption is by design and 
not
+  a vulnerability; but when encryption is enabled, a failure to actually 
protect
+  traffic — no-op encryption, silent downgrade, or MITM bypass — is in scope.
+- **Any hosting cloud or infrastructure provider is trusted, as is the
+  underlying hardware.** This includes the CPU, memory, storage, and network
+  hardware, even on shared/multi-tenant cloud systems where that hardware is
+  physically shared with other tenants. Attacks that require malicious or
+  compromised hardware, hypervisor escape, or cross-tenant side channels
+  (speculative-execution, Rowhammer, and similar) are the responsibility of the
+  hardware and infrastructure provider, and are out of scope.
+- **The underlying operating system is trusted.** Hadoop relies on OS process
+  isolation, file permissions, and (where required) OS-level disk encryption.
+  An attack that first requires the OS to be compromised or misconfigured is 
out
+  of scope.
+- **Valid site configuration is trusted.** We expect `core-site.xml`,
+  `hdfs-site.xml`, `yarn-site.xml` and the rest of the site configuration to be
+  valid and to be writable only by trusted administrators. If an attacker can
+  manipulate the site configuration, the game is already over — that is out of
+  scope.
+- **The classpath is trusted.** We expect no malicious JAR files to be on the 
classpath.

Review Comment:
   That said, we should include an allowlist filters on classes loaded 
dynamically from the classpath, if we do not already.



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*
+
+
+## Reporting a Vulnerability
+
+Report security vulnerabilities in Apache Hadoop privately to
+**[email protected]**.
+
+* Do not cc: any public mailing list.
+* Do **not** open a public JIRA issue, GitHub
+issue, or pull request for an unfixed vulnerability.
+
+For vulnerabilities in CI pipelines, see
+[Reporting Vulnerabilities in CI 
Pipelines](#reporting-vulnerabilities-in-ci-pipelines).
+
+See the Apache Software Foundation's
+[guidelines for reporting security issues](https://www.apache.org/security/) 
for
+the responsible-disclosure process that applies to all ASF projects.
+
+## Third Party Modules
+
+### Reporting a Known CVE in a Hadoop Dependency
+
+Do not report the existence of a published CVE in a Hadoop dependency
+to the security list. These are published and do not need to be treated as
+confidential.
+
+These are considered improvements in the project, and are managed in
+the project's [issue tracker](https://issues.apache.org/jira/issues/).
+1. Search for any existing issue covering the dependency upgrade.
+2. If it exists, read it, its discussion, the PRs etc, and see what versions
+   it has been merged to.
+3. If it hasn't been merged, look at why and get involved: major work is 
likely to be
+   needed.
+4. If there isn't an issue, create one and start work on the PR!
+
+Tip: an easy way to check for the version of a library to ship in the trunk
+release of hadoop is the [LICENSE-binary](./LICENSE-binary) file.
+
+Please do not send an email listing the CVEs an automated scan
+tool reported and requesting updates, timelines etc.
+Open source development is a community process, and addressing this is done
+in the [developer mailing lists](https://hadoop.apache.org/mailing_lists.html).
+Join the community to help get your needs addressed.
+
+If you cannot find existing information on whether the project is affected by 
the issue in the advisory,
+it may be up to you, as a part of the project community, to participate in its 
handling.
+Ensure you provide detailed information when starting a discussion - review 
how the project uses the dependency and have your opinion on the priority to 
upgrade,
+or even remove, the dependency.
+Contributions upgrading the dependency to a version that is not affected by 
the problem are generally welcomed, though will not typically expedite the 
release schedule.
+
+Actively participating in the release process, especially qualifying 
pre-release artifacts in your
+own deployments, is the most effective way of accelerating the release 
timetable.
+
+### Providing Advance Warning of a Critical CVE in a Hadoop Dependency
+
+If a team providing a library which Hadoop bundles has a critical CVE which
+a forthcoming fix will correct, they are encouraged to notify the hadoop 
security
+list so we can identify whether the project is exposed, help review and 
validate fixes and
+co-ordinate releases.
+
+We SHALL treat all such reports as confidential.
+
+### Reporting a Newly-Discovered Vulnerability in a Third-Party Module
+
+Security bugs in third-party modules (the JVM, the Kerberos infrastructure, 
cloud
+SDKs, connectors, or any other dependency) should be reported to their 
respective
+maintainers, through their own security-reporting mechanisms — after verifying
+the issue is in scope of *their* threat model and reproduces against *their*
+current release.
+
+## Supported Versions
+
+Security fixes are made only to the most recent Apache Hadoop release line(s).
+Older release lines are end-of-life and do not receive security updates; the
+remedy for a vulnerability in an old line is to upgrade. Refer to the
+[Apache Hadoop release and download 
policy](https://hadoop.apache.org/releases.html)
+for which lines are currently maintained. A report MUST be reproducible 
against a
+maintained release or the current `trunk` branch.
+
+## The Hadoop Threat Model
+
+In the Hadoop threat model there are **trusted elements**. Vulnerabilities that
+require the compromise of these trusted elements are outside the scope of the
+model:
+
+- **Cluster Administrators are trusted.**
+- **DNS is trusted.**
+- **The Kerberos authentication infrastructure is trusted.** Active Directory,
+  FreeIPA, or whichever other Key Distribution Center (KDC) is in use is 
trusted
+  and required to be well-configured — including synchronized clocks 
(NTP/chrony)
+  across the KDC, services, and clients, within the Kerberos clock-skew window.
+  Authentication failures caused by clock drift are operational bugs, not
+  vulnerabilities.
+- **The network perimeter is trusted to keep the public internet out, but the
+  wire is not assumed confidential.** The perimeter does not authenticate 
callers —
+  Kerberos authentication does that at the service level; the perimeter's job 
is to
+  keep the cluster off the public internet (Hadoop clusters are never 
web-facing).
+  Within that, Hadoop may run with optional wire encryption (RPC `privacy` QOP;
+  HDFS block-transfer encryption). Running without encryption is by design and 
not
+  a vulnerability; but when encryption is enabled, a failure to actually 
protect
+  traffic — no-op encryption, silent downgrade, or MITM bypass — is in scope.
+- **Any hosting cloud or infrastructure provider is trusted, as is the
+  underlying hardware.** This includes the CPU, memory, storage, and network
+  hardware, even on shared/multi-tenant cloud systems where that hardware is
+  physically shared with other tenants. Attacks that require malicious or
+  compromised hardware, hypervisor escape, or cross-tenant side channels
+  (speculative-execution, Rowhammer, and similar) are the responsibility of the
+  hardware and infrastructure provider, and are out of scope.
+- **The underlying operating system is trusted.** Hadoop relies on OS process
+  isolation, file permissions, and (where required) OS-level disk encryption.
+  An attack that first requires the OS to be compromised or misconfigured is 
out
+  of scope.
+- **Valid site configuration is trusted.** We expect `core-site.xml`,

Review Comment:
   Oh, okay, you speak to this below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HADOOP-19925. Create a SECURITY.md file to define the security model [hadoop]

Reply via email to