Re: [PR] HADOOP-19925. Create a SECURITY.md file to define the security model [hadoop]

via GitHub Tue, 30 Jun 2026 09:21:06 -0700


steveloughran commented on code in PR #8562:
URL: https://github.com/apache/hadoop/pull/8562#discussion_r3500271691



##########
SECURITY.md:
##########
@@ -0,0 +1,620 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Apache Hadoop Security Model
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
+
+This document defines the security model of Apache Hadoop: the deployments it 
is
+designed to protect, the boundaries it defends, and — equally importantly — the
+things which are *not* vulnerabilities. It exists for human reporters and for
+anyone using automated or AI-assisted tooling to look for security issues.
+
+**TL;DR: Hadoop's security model defends a Kerberos-secured cluster running on 
a
+trusted operating system, behind a network perimeter, with a valid site
+configuration. Findings which only apply outside that model are bugs, not
+vulnerabilities.**
+
+## Before Filing a Report (Including AI-Assisted Reports)
+
+The deployment Hadoop's security model defends is a **Kerberos-secured 
cluster**.
+Many findings that look like vulnerabilities in other contexts are not
+vulnerabilities here, because the surrounding deployment is trusted by design.
+
+You *MUST NOT* file a security report for:
+
+- Issues that require the operator to edit their own Hadoop site configuration,
+  place malicious/vulnerabile libraries on their own classpath,
+  or pass malicious arguments to their own command invocation.
+- **Job submission running user-supplied code.** Submitting work to YARN or
+  MapReduce executes the submitter's code as the submitter's identity. That is
+  the product, not a vulnerability. See the threat model below.
+- **Denial of service at scale.** A large Hadoop cluster exists to execute jobs
+  at scale; such a cluster can itself be used to mount distributed attacks, and
+  authenticated users can exhaust resources. Resource exhaustion and 
performance
+  degradation from legitimate authenticated use are out of scope.
+- Issues that require the attacker to already hold cluster or remote-store
+  credentials, a valid Kerberos principal, or local disk access.
+- Anything against the **default insecure (non-Kerberos) mode** — it is 
insecure
+  by design (see the deployment model below).
+- **Transitive CVEs** in dependencies Hadoop builds or ships against. See
+  [Third Party Modules](#third-party-modules).
+- Raw **scanner output** (Snyk, Dependabot, Trivy, Zizmor, etc.) without a
+  reproducer against the current `trunk` branch.
+- Theoretical findings ("an attacker who could X might then Y") without a
+  reproduction.
+
+A valid report includes:
+
+- The Hadoop version, and ideally the git SHA it was reproduced against.
+- The exact steps, configuration, and commands used to reproduce it.
+- The observed in-scope failure, and what was expected instead.
+- Where a CVE/CVSS score is claimed, the reasoning behind that score.
+
+### For Partly/Fully AI-Generated Reports
+
+AI-assisted reports are accepted **only** if the submitter has verified the
+finding by hand against current source and includes a runnable reproducer.
+
+In addition, the submitter of an AI-generated report is
+
+1. REQUIRED to understand what Hadoop is, to understand the claimed 
vulnerability,
+and to be able to explain it in their own words — including justifying any 
claimed CVE or CVSS
+scores. If the submitter is unable to do this, then any credit for a resulting
+CVE will be assigned to the AI tool alone, and not to the submitter.
+
+2. MUST declare the AI tool used, and be willing to provide the log.
+   The log is a key part of AI tool reports, and we need to be able to 
track/replicate these.
+
+*Unverified LLM-generated reports waste maintainer time and will be closed
+without further response.*
+
+
+## Reporting a Vulnerability
+
+Report security vulnerabilities in Apache Hadoop privately to
+**[email protected]**.
+
+* Do not cc: any public mailing list.
+* Do **not** open a public JIRA issue, GitHub
+issue, or pull request for an unfixed vulnerability.
+
+For vulnerabilities in CI pipelines, see
+[Reporting Vulnerabilities in CI 
Pipelines](#reporting-vulnerabilities-in-ci-pipelines).
+
+See the Apache Software Foundation's
+[guidelines for reporting security issues](https://www.apache.org/security/) 
for
+the responsible-disclosure process that applies to all ASF projects.
+
+## Third Party Modules
+
+### Reporting a Known CVE in a Hadoop Dependency
+
+Do not report the existence of a published CVE in a Hadoop dependency
+to the security list. These are published and do not need to be treated as
+confidential.
+
+These are considered improvements in the project, and are managed in
+the project's [issue tracker](https://issues.apache.org/jira/issues/).
+1. Search for any existing issue covering the dependency upgrade.
+2. If it exists, read it, its discussion, the PRs etc, and see what versions
+   it has been merged to.
+3. If it hasn't been merged, look at why and get involved: major work is 
likely to be
+   needed.
+4. If there isn't an issue, create one and start work on the PR!
+
+Tip: an easy way to check for the version of a library to ship in the trunk
+release of hadoop is the [LICENSE-binary](./LICENSE-binary) file.
+
+Please do not send an email listing the CVEs an automated scan
+tool reported and requesting updates, timelines etc.
+Open source development is a community process, and addressing this is done
+in the [developer mailing lists](https://hadoop.apache.org/mailing_lists.html).
+Join the community to help get your needs addressed.
+
+If you cannot find existing information on whether the project is affected by 
the issue in the advisory,
+it may be up to you, as a part of the project community, to participate in its 
handling.
+Ensure you provide detailed information when starting a discussion - review 
how the project uses the dependency and have your opinion on the priority to 
upgrade,
+or even remove, the dependency.
+Contributions upgrading the dependency to a version that is not affected by 
the problem are generally welcomed, though will not typically expedite the 
release schedule.
+
+Actively participating in the release process, especially qualifying 
pre-release artifacts in your
+own deployments, is the most effective way of accelerating the release 
timetable.
+
+### Providing Advance Warning of a Critical CVE in a Hadoop Dependency
+
+If a team providing a library which Hadoop bundles has a critical CVE which
+a forthcoming fix will correct, they are encouraged to notify the hadoop 
security
+list so we can identify whether the project is exposed, help review and 
validate fixes and
+co-ordinate releases.
+
+We SHALL treat all such reports as confidential.
+
+### Reporting a Newly-Discovered Vulnerability in a Third-Party Module
+
+Security bugs in third-party modules (the JVM, the Kerberos infrastructure, 
cloud
+SDKs, connectors, or any other dependency) should be reported to their 
respective
+maintainers, through their own security-reporting mechanisms — after verifying
+the issue is in scope of *their* threat model and reproduces against *their*
+current release.
+
+## Supported Versions
+
+Security fixes are made only to the most recent Apache Hadoop release line(s).
+Older release lines are end-of-life and do not receive security updates; the
+remedy for a vulnerability in an old line is to upgrade. Refer to the
+[Apache Hadoop release and download 
policy](https://hadoop.apache.org/releases.html)
+for which lines are currently maintained. A report MUST be reproducible 
against a
+maintained release or the current `trunk` branch.
+
+## The Hadoop Threat Model
+
+In the Hadoop threat model there are **trusted elements**. Vulnerabilities that
+require the compromise of these trusted elements are outside the scope of the
+model:
+
+- **Cluster Administrators are trusted.**
+- **DNS is trusted.**
+- **The Kerberos authentication infrastructure is trusted.** Active Directory,
+  FreeIPA, or whichever other Key Distribution Center (KDC) is in use is 
trusted
+  and required to be well-configured — including synchronized clocks 
(NTP/chrony)
+  across the KDC, services, and clients, within the Kerberos clock-skew window.
+  Authentication failures caused by clock drift are operational bugs, not
+  vulnerabilities.
+- **The network perimeter is trusted to keep the public internet out, but the
+  wire is not assumed confidential.** The perimeter does not authenticate 
callers —
+  Kerberos authentication does that at the service level; the perimeter's job 
is to
+  keep the cluster off the public internet (Hadoop clusters are never 
web-facing).
+  Within that, Hadoop may run with optional wire encryption (RPC `privacy` QOP;
+  HDFS block-transfer encryption). Running without encryption is by design and 
not
+  a vulnerability; but when encryption is enabled, a failure to actually 
protect
+  traffic — no-op encryption, silent downgrade, or MITM bypass — is in scope.
+- **Any hosting cloud or infrastructure provider is trusted, as is the
+  underlying hardware.** This includes the CPU, memory, storage, and network
+  hardware, even on shared/multi-tenant cloud systems where that hardware is
+  physically shared with other tenants. Attacks that require malicious or
+  compromised hardware, hypervisor escape, or cross-tenant side channels
+  (speculative-execution, Rowhammer, and similar) are the responsibility of the
+  hardware and infrastructure provider, and are out of scope.
+- **The underlying operating system is trusted.** Hadoop relies on OS process
+  isolation, file permissions, and (where required) OS-level disk encryption.
+  An attack that first requires the OS to be compromised or misconfigured is 
out
+  of scope.
+- **Valid site configuration is trusted.** We expect `core-site.xml`,
+  `hdfs-site.xml`, `yarn-site.xml` and the rest of the site configuration to be
+  valid and to be writable only by trusted administrators. If an attacker can
+  manipulate the site configuration, the game is already over — that is out of
+  scope.
+- **The classpath is trusted.** We expect no malicious JAR files to be on the 
classpath.

Review Comment:
   to late for that I am afraid



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HADOOP-19925. Create a SECURITY.md file to define the security model [hadoop]

Reply via email to