flyingImer commented on code in PR #4433: URL: https://github.com/apache/polaris/pull/4433#discussion_r3270066589
########## AGENTS.md: ########## @@ -277,3 +277,71 @@ Do not include AI tool names or model identifiers in code, comments, or commit messages. For PR descriptions, follow the disclosure guidance in [CONTRIBUTING.md](CONTRIBUTING.md#guidelines-for-ai-assisted-contributions). The human author owns the contribution. + +## Security issues + +Before reporting or fixing security issues, read +[`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md). Use it to determine +whether a finding crosses a Polaris security boundary, which actor can exploit it, +what protected asset is affected, and whether the issue is a Polaris +vulnerability, a deployment responsibility, a dependency issue, or a false +positive. + +Use [`SECURITY.md`](SECURITY.md) and the public +[security reporting page](https://polaris.apache.org/community/security-report/) +for reporting process, disclosure handling, and where suspected vulnerabilities +should be sent. The public security reporting page also lists previously +published advisories and CVEs; treat those as public historical examples of issue +classes, not as substitutes for the threat model's boundaries and invariants. + +For dependency-related findings, check whether the vulnerable behavior is +reachable through Polaris, whether it affects a Polaris security boundary or +protected asset, and whether the upstream project's security policy or disclosure +process applies. + +For packaging, optional-feature, or custom-build findings, identify the exact +source revision, selected modules, enabled Gradle properties, optional +extensions, packaging format, and deployment-relevant configuration before +assessing reachability, supported status, severity, or whether the issue belongs +to Polaris or downstream integration code. + +For findings in related Polaris tools, including tools from `apache/polaris-tools`, +use the tool-specific threat model and agent guidance in that repository where +available. A finding in a related tool is not automatically a Polaris server +vulnerability. + +For findings involving storage locations, credential vending, properties, +configuration, tokens, credential reset, token exchange, externally configured +endpoints, or existence checks, classify the protected asset and trust boundary +using `SECURITY-THREAT-MODEL.md` before deciding whether the behavior is a +Polaris vulnerability, deployment responsibility, upstream issue, or false +positive. + +For findings involving persistence, authentication, authorization, storage, or +extension variants, identify the concrete backend, authentication mode, +authorization model, policy service, storage provider, and deployment profile +before deciding reachability or severity. Do not assume behavior in JDBC, +in-memory, NoSQL, internal RBAC, OIDC, OPA, Ranger, or mixed deployments is +identical, and do not combine assumptions from mutually exclusive variants. + +Report security-relevant documentation issues separately from vulnerabilities +when unclear docs, examples, defaults, or missing warnings could reasonably lead +users to unsafe deployment, credential handling, authorization, storage, +integration, upgrade, or operational choices. + +Use ASF security guidance for process and classification; do not infer advisory +or CVE status from `SECURITY-THREAT-MODEL.md` alone. + +When reporting potential findings, label ASF severity, proof status, and +CVE/advisory status as non-authoritative triage estimates. + +Do not treat a test as proof of a vulnerability unless it demonstrates that the +stated actor can cross a real trust boundary without already-authorized access, +privileged fixtures, mocked trust decisions, or protected information. + +Avoid over-reporting findings that are only security-adjacent and lack a +realistic path to impact or realistic user misunderstanding. + +Do not include private vulnerability details, exploit payloads, reporter names, +private mailing-list content, secrets, or non-public infrastructure details in +code, comments, tests, documentation, commit messages, or PR descriptions. Review Comment: ~70 lines here largely restate SECURITY-THREAT-MODEL.md. 2 cents: compress to a pointer plus the two non-negotiables. ```md ## Security Issues Before reporting or fixing security issues, read [`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md) to determine whether a finding is a Polaris vulnerability, a deployment responsibility, a dependency issue, or a false positive. Use [`SECURITY.md`](SECURITY.md) and the [security reporting page](https://polaris.apache.org/community/security-report/) for reporting process and disclosure handling. ASF severity, advisory status, and CVE candidacy are non-authoritative triage estimates. Do not infer them from `SECURITY-THREAT-MODEL.md` alone. Do not include private vulnerability details, exploit payloads, reporter names, private mailing-list content, secrets, or non-public infrastructure details in code, comments, tests, documentation, commit messages, or PR descriptions. ``` The category guidance (storage, persistence, packaging, optional integrations, related tools, variants, dependencies, documentation findings) is already in SECURITY-THREAT-MODEL.md. Inline restatement just doubles maintenance. ########## AGENTS.md: ########## @@ -277,3 +277,71 @@ Do not include AI tool names or model identifiers in code, comments, or commit messages. For PR descriptions, follow the disclosure guidance in [CONTRIBUTING.md](CONTRIBUTING.md#guidelines-for-ai-assisted-contributions). The human author owns the contribution. + +## Security issues + +Before reporting or fixing security issues, read +[`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md). Use it to determine +whether a finding crosses a Polaris security boundary, which actor can exploit it, +what protected asset is affected, and whether the issue is a Polaris +vulnerability, a deployment responsibility, a dependency issue, or a false +positive. + +Use [`SECURITY.md`](SECURITY.md) and the public +[security reporting page](https://polaris.apache.org/community/security-report/) +for reporting process, disclosure handling, and where suspected vulnerabilities +should be sent. The public security reporting page also lists previously +published advisories and CVEs; treat those as public historical examples of issue +classes, not as substitutes for the threat model's boundaries and invariants. + +For dependency-related findings, check whether the vulnerable behavior is +reachable through Polaris, whether it affects a Polaris security boundary or +protected asset, and whether the upstream project's security policy or disclosure +process applies. + +For packaging, optional-feature, or custom-build findings, identify the exact +source revision, selected modules, enabled Gradle properties, optional +extensions, packaging format, and deployment-relevant configuration before +assessing reachability, supported status, severity, or whether the issue belongs +to Polaris or downstream integration code. + +For findings in related Polaris tools, including tools from `apache/polaris-tools`, +use the tool-specific threat model and agent guidance in that repository where +available. A finding in a related tool is not automatically a Polaris server +vulnerability. + +For findings involving storage locations, credential vending, properties, +configuration, tokens, credential reset, token exchange, externally configured +endpoints, or existence checks, classify the protected asset and trust boundary +using `SECURITY-THREAT-MODEL.md` before deciding whether the behavior is a +Polaris vulnerability, deployment responsibility, upstream issue, or false +positive. + +For findings involving persistence, authentication, authorization, storage, or +extension variants, identify the concrete backend, authentication mode, +authorization model, policy service, storage provider, and deployment profile +before deciding reachability or severity. Do not assume behavior in JDBC, +in-memory, NoSQL, internal RBAC, OIDC, OPA, Ranger, or mixed deployments is +identical, and do not combine assumptions from mutually exclusive variants. Review Comment: I wonder if the enumeration belongs in SECURITY-THREAT-MODEL.md instead. Variant lists drift as the codebase grows; what AGENTS.md needs is the principle ("identify the active variant before assessing reachability"). Otherwise next time a backend lands only one file gets updated and AGENTS.md lags. ########## SECURITY-THREAT-MODEL.md: ########## @@ -0,0 +1,740 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Apache Polaris Threat Model + +## Purpose + +This document defines the security boundaries for Apache Polaris. It is intended +to guide maintainers, security reviewers, and automated analysis tools when +evaluating potential vulnerabilities. + +Use this document to decide whether a finding affects a protected Polaris asset, +crosses a Polaris trust boundary, violates a Polaris security invariant, and +should be handled as a Polaris security issue. + +This document is guidance for analysis and triage. It does not make policy +decisions, accept or reject vulnerability reports, assign ASF severity, allocate +CVEs, or determine disclosure handling. Human project and ASF security review is +required for those decisions. + +## Scope + +Apache Polaris is a catalog service for managing Apache Iceberg catalogs, +namespaces, tables, views, principals, roles, policies, and related metadata. + +This threat model covers: + +- Polaris server and runtime components. +- REST APIs exposed by Polaris. +- Authentication and authorization behavior. +- Catalog metadata access and mutation. +- Persistence-layer interactions. +- Integration points with storage, catalog, policy, and identity backends where + Polaris makes authentication, authorization, or trust decisions. + +This threat model does not cover: + +- Vulnerabilities in client applications using Polaris incorrectly. +- Compromise of the underlying database, object store, identity provider, + container platform, host operating system, or deployment platform. +- Denial of service caused solely by insufficient infrastructure sizing. +- Bugs in third-party dependencies unless Polaris exposes them through unsafe + configuration or usage. + +## Component Families + +Polaris has several component families with different entry points, deployment +models, and trust boundaries: + +| Component family | Representative entry points | Deployment model | Threat-model scope | +| --- | --- | --- | --- | +| Polaris server and runtime | Management APIs, catalog APIs, service runtime | Long-running service | In scope for authentication, authorization, metadata, persistence, storage, policy, and credential-handling decisions. | +| Polaris admin tool | Administrative CLI commands and generated local profiles | Operator tool | In scope when handling credentials, configuration, administration, logs, or generated artifacts. | +| Python CLI under `client/python/` | Client commands, local configuration, command output | User or operator CLI | In scope when handling credentials, tokens, catalog metadata, local profiles, logs, or generated artifacts. | +| Release artifacts | Source release, runnable tarballs, container images, Helm charts | Distribution and deployment | In scope for packaged defaults, included modules, enabled features, generated configuration, and documented deployment paths. | +| Reusable modules | `polaris-core`, `polaris-runtime-service`, extension modules, and selected Gradle project combinations | Embedded or customized downstream use | In scope when Polaris code makes security decisions; downstream-only integration code is evaluated separately. | +| Optional integrations | Persistence backends, authentication modes, authorization services, storage providers, federation, and optional extensions | Runtime-selected or build-enabled features | In scope for coherent supported configurations; findings must identify the active variant and relevant configuration. | +| Related Polaris tools | Tools published from `apache/polaris-tools` and referenced by the Polaris project or website | Developer, operator, migration, synchronization, MCP, benchmark, or web UI tools | Evaluated with tool-specific audience, deployment model, protected assets, and trust boundaries; not automatically Polaris server vulnerabilities. | + +## System Overview + +Polaris accepts requests from clients, authenticates callers through configured +authentication mechanisms, authorizes operations against Polaris roles, +privileges, policies, and catalog metadata, and persists metadata through the +configured persistence backend. + +Polaris may integrate with external systems, including identity providers, +object stores, policy decision points, catalog backends, and deployment +infrastructure. These systems can affect Polaris security when Polaris relies on +their output for authentication, authorization, routing, or persistence +decisions. + +## Actors And Roles + +- Anonymous caller: A caller without valid authentication. +- Authenticated principal: A valid user or service principal known to Polaris. +- Catalog user: An authenticated principal with privileges on one or more + catalog resources. +- Catalog administrator: A principal with administrative privileges over a + catalog. +- Realm administrator: A principal with administrative privileges over the + Polaris realm. +- Deployment operator: A person or system with access to deploy, configure, or + operate Polaris infrastructure. +- External identity provider: A trusted authentication authority configured by + the deployment. +- External policy decision point: A configured authorization service that may + participate in Polaris access decisions. +- Persistence backend: The configured database or storage system used by + Polaris. + +## Protected Assets + +Polaris treats the following as security-sensitive: + +- Authentication credentials, bearer tokens, client secrets, signing keys, and + refresh tokens. +- Principal, role, privilege, and policy metadata. +- Principal, principal-role, and catalog-role names when a deployment treats + identity or role names as sensitive or personal data. +- Catalog, namespace, table, and view metadata whose visibility is governed by + Polaris authorization. +- Storage locations, table locations, metadata locations, manifest locations, + statistics locations, and other URI-bearing metadata that define where + catalog-managed data or metadata resides. +- Temporary or delegated storage credentials, scoped storage policies, + credential access boundaries, session policies, and provider-specific policy + expressions. +- Configuration values that affect authentication, authorization, token + validation, policy evaluation, credential vending, storage boundaries, + federation, or backend connectivity. +- Audit-relevant request identity and authorization context. + +## Trust Boundaries + +Polaris assumes the following boundaries: + +- Network callers are untrusted until authenticated. +- Authenticated principals are not inherently trusted to access resources; every + protected operation must pass authorization. +- Request-provided identifiers, catalog names, namespace names, table names, + view names, role names, policy names, and principal names are untrusted input. +- Identity-provider claims are trusted only after token validation according to + the configured authentication mechanism. +- External policy decisions are trusted only according to the configured policy + integration and the request context Polaris supplies to that integration. +- Management-plane APIs, catalog data-plane APIs, admin tools, and client tools + may expose different metadata and require separate authorization checks. +- Catalog properties, entity properties, and configuration values may cross from + management-plane configuration into data-plane client-visible responses; they + must be classified by intended visibility before storing sensitive values. +- Credential vending crosses from Polaris authorization into object-store or + external-system authorization; delegated credentials must be scoped to the + authorized actor, operation, and storage locations. +- Storage locations and URI-bearing metadata cross from Polaris metadata into + object stores, metastores, filesystems, or external catalogs; caller-supplied + locations are untrusted until validated against the effective storage policy. +- Provider-specific storage policy expressions, IAM policies, access boundaries, + and session policies are security boundaries and must safely encode + caller-controlled identifier or path material. +- Externally configured endpoints, federation targets, identity-provider + metadata, object-store endpoints, and catalog backends are trusted only for + the configured purpose and must not silently redirect secrets or privileged + requests outside that trust relationship. +- Persistence backends are trusted to store and return data, but persistence-layer + access by internal Polaris callsites is not by itself proof that the data may be + returned to an external caller or used to mutate protected state. Authorization + must be enforced before protected data is exposed or protected state is changed. +- Deployment operators are trusted with configuration and infrastructure-level + secrets. + +## Security Invariants + +The following properties must hold: + +- Protected API operations require authentication unless explicitly documented as + public. +- Authorization checks must be performed before returning or mutating protected + catalog metadata. +- Grant and role-management operations must enforce the intended grant authority + defined by Polaris authorization rules. A principal must not be able to obtain + privileges that direct checks would reject by using indirect role creation, + self-grants, role nesting, privilege delegation, or other second-order effects. +- Role, privilege, and policy changes must not bypass scope restrictions. +- Realm, catalog, namespace, table, and view identifiers must not allow access + across authorization boundaries. +- Token issuance, token exchange, credential reset, and credential rotation must + preserve the intended principal, realm, role scope, expiration, and revocation + semantics. +- Tokens, credentials, and secrets must not be logged, returned in API + responses, or persisted in plaintext unless explicitly required and protected + by deployment controls. +- Properties and configuration values that are returned to clients must be + treated as client-visible and must not be used as a secret store. +- Storage locations and URI-bearing metadata must be validated against the + effective catalog, namespace, table, and storage-policy boundaries before they + are persisted, used to access storage, or used to mint delegated credentials. +- Credential vending must not occur before the relevant location, authorization, + and scope checks have completed. +- Temporary storage credentials must be scoped as narrowly as the configured + storage provider and documented mode allow. Provider-specific limitations that + broaden scope must be explicit to operators. +- Within the applicable realm and configured storage-policy scope, reused, + overlapping, or ambiguous storage locations must not create an unintended + authorization bypass. Explicitly configured overlap modes, table clones, and + documented credential-vending scope limitations should be treated according to + their documented behavior, not reported solely because overlap exists. Review Comment: +1 to dennishuo. As written, agents will file false-positive findings against intentional overlap-check overrides, table clones, legitimate role nesting (line 179), and internal-callsite reads that don't return data externally (line 161-164). My bias would be to land all three clarifications in this PR. Once AGENTS.md cites the model the broad invariants become the default rule, and the project can't reclassify findings already filed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
