RussellSpitzer commented on code in PR #4433: URL: https://github.com/apache/polaris/pull/4433#discussion_r3244439447
########## SECURITY-THREAT-MODEL.md: ########## @@ -0,0 +1,730 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Apache Polaris Threat Model + +## Purpose + +This document defines the security boundaries for Apache Polaris. It is intended +to guide maintainers, security reviewers, and automated analysis tools when +evaluating potential vulnerabilities. + +Use this document to decide whether a finding affects a protected Polaris asset, +crosses a Polaris trust boundary, violates a Polaris security invariant, and +should be handled as a Polaris security issue. + +This document is guidance for analysis and triage. It does not make policy +decisions, accept or reject vulnerability reports, assign ASF severity, allocate +CVEs, or determine disclosure handling. Human project and ASF security review is +required for those decisions. + +## Scope + +Apache Polaris is a catalog service for managing Apache Iceberg catalogs, +namespaces, tables, views, principals, roles, policies, and related metadata. + +This threat model covers: + +- Polaris server and runtime components. +- REST APIs exposed by Polaris. +- Authentication and authorization behavior. +- Catalog metadata access and mutation. +- Persistence-layer interactions. +- Integration points with storage, catalog, policy, and identity backends where + Polaris makes authentication, authorization, or trust decisions. + +This threat model does not cover: + +- Vulnerabilities in client applications using Polaris incorrectly. +- Compromise of the underlying database, object store, identity provider, + container platform, host operating system, or deployment platform. +- Denial of service caused solely by insufficient infrastructure sizing. +- Bugs in third-party dependencies unless Polaris exposes them through unsafe + configuration or usage. + +## Component Families + +Polaris has several component families with different entry points, deployment +models, and trust boundaries: + +| Component family | Representative entry points | Deployment model | Threat-model scope | +| --- | --- | --- | --- | +| Polaris server and runtime | Management APIs, catalog APIs, service runtime | Long-running service | In scope for authentication, authorization, metadata, persistence, storage, policy, and credential-handling decisions. | +| Polaris admin tool | Administrative CLI commands and generated local profiles | Operator tool | In scope when handling credentials, configuration, administration, logs, or generated artifacts. | +| Python CLI under `client/python/` | Client commands, local configuration, command output | User or operator CLI | In scope when handling credentials, tokens, catalog metadata, local profiles, logs, or generated artifacts. | +| Release artifacts | Source release, runnable tarballs, container images, Helm charts | Distribution and deployment | In scope for packaged defaults, included modules, enabled features, generated configuration, and documented deployment paths. | +| Reusable modules | `polaris-core`, `polaris-runtime-service`, extension modules, and selected Gradle project combinations | Embedded or customized downstream use | In scope when Polaris code makes security decisions; downstream-only integration code is evaluated separately. | +| Optional integrations | Persistence backends, authentication modes, authorization services, storage providers, federation, and optional extensions | Runtime-selected or build-enabled features | In scope for coherent supported configurations; findings must identify the active variant and relevant configuration. | +| Related Polaris tools | Tools published from `apache/polaris-tools` and referenced by the Polaris project or website | Developer, operator, migration, synchronization, MCP, benchmark, or web UI tools | Evaluated with tool-specific audience, deployment model, protected assets, and trust boundaries; not automatically Polaris server vulnerabilities. | + +## System Overview + +Polaris accepts requests from clients, authenticates callers through configured +authentication mechanisms, authorizes operations against Polaris roles, +privileges, policies, and catalog metadata, and persists metadata through the +configured persistence backend. + +Polaris may integrate with external systems, including identity providers, +object stores, policy decision points, catalog backends, and deployment +infrastructure. These systems can affect Polaris security when Polaris relies on +their output for authentication, authorization, routing, or persistence +decisions. + +## Actors And Roles + +- Anonymous caller: A caller without valid authentication. +- Authenticated principal: A valid user or service principal known to Polaris. +- Catalog user: An authenticated principal with privileges on one or more + catalog resources. +- Catalog administrator: A principal with administrative privileges over a + catalog. +- Realm administrator: A principal with administrative privileges over the + Polaris realm. +- Deployment operator: A person or system with access to deploy, configure, or + operate Polaris infrastructure. +- External identity provider: A trusted authentication authority configured by + the deployment. +- External policy decision point: A configured authorization service that may + participate in Polaris access decisions. +- Persistence backend: The configured database or storage system used by + Polaris. + +## Protected Assets + +Polaris treats the following as security-sensitive: + +- Authentication credentials, bearer tokens, client secrets, signing keys, and + refresh tokens. +- Principal, role, privilege, and policy metadata. +- Principal, principal-role, and catalog-role names when a deployment treats + identity or role names as sensitive or personal data. +- Catalog, namespace, table, and view metadata where access is restricted by + Polaris authorization. +- Storage locations, table locations, metadata locations, manifest locations, + statistics locations, and other URI-bearing metadata that define where + catalog-managed data or metadata resides. +- Temporary or delegated storage credentials, scoped storage policies, + credential access boundaries, session policies, and provider-specific policy + expressions. +- Configuration values that affect authentication, authorization, token + validation, policy evaluation, credential vending, storage boundaries, + federation, or backend connectivity. +- Audit-relevant request identity and authorization context. + +## Trust Boundaries + +Polaris assumes the following boundaries: + +- Network callers are untrusted until authenticated. +- Authenticated principals are not inherently trusted to access resources; every + protected operation must pass authorization. +- Request-provided identifiers, catalog names, namespace names, table names, + view names, role names, policy names, and principal names are untrusted input. +- Identity-provider claims are trusted only after token validation according to + the configured authentication mechanism. +- External policy decisions are trusted only according to the configured policy + integration and the request context Polaris supplies to that integration. +- Management-plane APIs, catalog data-plane APIs, admin tools, and client tools + may expose different metadata and require separate authorization checks. +- Catalog properties, entity properties, and configuration values may cross from + management-plane configuration into data-plane client-visible responses; they + must be classified by intended visibility before storing sensitive values. +- Credential vending crosses from Polaris authorization into object-store or + external-system authorization; delegated credentials must be scoped to the + authorized actor, operation, and storage locations. +- Storage locations and URI-bearing metadata cross from Polaris metadata into + object stores, metastores, filesystems, or external catalogs; caller-supplied + locations are untrusted until validated against the effective storage policy. +- Provider-specific storage policy expressions, IAM policies, access boundaries, + and session policies are security boundaries and must safely encode + caller-controlled identifier or path material. +- Externally configured endpoints, federation targets, identity-provider + metadata, object-store endpoints, and catalog backends are trusted only for + the configured purpose and must not silently redirect secrets or privileged + requests outside that trust relationship. +- Persistence backends are trusted to store and return data, but Polaris must not + rely on callers to enforce authorization before persistence access. +- Deployment operators are trusted with configuration and infrastructure-level + secrets. + +## Security Invariants + +The following properties must hold: + +- Protected API operations require authentication unless explicitly documented as + public. +- Authorization checks must be performed before returning or mutating protected + catalog metadata. +- A principal must not be able to grant itself privileges it does not already + have authority to grant. +- Role, privilege, and policy changes must not bypass scope restrictions. +- Realm, catalog, namespace, table, and view identifiers must not allow access + across authorization boundaries. +- Token issuance, token exchange, credential reset, and credential rotation must + preserve the intended principal, realm, role scope, expiration, and revocation + semantics. +- Tokens, credentials, and secrets must not be logged, returned in API + responses, or persisted in plaintext unless explicitly required and protected + by deployment controls. +- Properties and configuration values that are returned to clients must be + treated as client-visible and must not be used as a secret store. +- Storage locations and URI-bearing metadata must be validated against the + effective catalog, namespace, table, and storage-policy boundaries before they + are persisted, used to access storage, or used to mint delegated credentials. +- Credential vending must not occur before the relevant location, authorization, + and scope checks have completed. +- Temporary storage credentials must be scoped as narrowly as the configured + storage provider and documented mode allow. Provider-specific limitations that + broaden scope must be explicit to operators. +- Reused, overlapping, or ambiguous storage locations must not allow a principal + authorized for one logical entity to access another logical entity's data or + metadata. +- Provider policy documents and expressions must escape or otherwise safely + encode caller-controlled identifiers, paths, and property values. +- User-controlled input must not be used to construct SQL, filesystem paths, + object-store paths, URLs, or process commands without appropriate validation or + safe APIs. +- Error messages must not disclose secrets or unauthorized metadata. +- Existence, names, and relationships of principals, roles, catalogs, + namespaces, tables, views, and policies must not be disclosed to unauthorized + callers unless that disclosure is explicitly documented and accepted. +- Configuration changes that create, reset, rotate, store, or delete credentials + or secrets must be authorized, consistently applied, and must not leave stale + credentials or orphaned secrets with unintended access. +- Internal and administrative APIs must not be exposed as public unauthenticated + APIs. +- Optional build features, extensions, and distribution-specific packaging must + not weaken Polaris authentication, authorization, credential handling, or + metadata isolation without explicit configuration and documentation. + +## Security Issues + +The following should generally be treated as potential security vulnerabilities: + +- Authentication bypass. +- Authorization bypass or privilege escalation. +- Cross-catalog or cross-realm data access caused by missing checks. +- Unauthorized mutation of catalog metadata, roles, policies, or principals. +- Exposure of credentials, tokens, secrets, or sensitive configuration. +- Exposure of secrets through client-visible properties, configuration + responses, logs, profiles, command output, browser storage, or generated + artifacts. +- Credential-vending behavior that grants broader storage access than the + caller's authorized operation or effective storage boundary. +- Storage-location validation bypasses, including unvalidated URI-bearing table + metadata, reused or overlapping locations, and provider-policy construction + issues. +- Token or credential lifecycle issues that allow access to continue beyond the + intended reset, rotation, revocation, realm, or scope boundary. +- Unauthorized disclosure of identity, role, policy, catalog, namespace, table, + or view existence when that existence is itself sensitive. +- SQL injection, command injection, unsafe deserialization, server-side request + forgery, path traversal, or template injection reachable through untrusted + input. +- Unsafe use of externally configured endpoints that can redirect privileged + traffic, credentials, tokens, metadata, or storage requests outside the + intended trust relationship. +- Logging or returning sensitive values to unauthorized users. +- Use of weak cryptographic validation for authentication or authorization + decisions. +- Insecure defaults that expose protected APIs or disable required security + checks. + +## Non-Issues And Deployment Responsibilities + +The following are not normally treated as Polaris vulnerabilities by themselves: + +- A deployment operator choosing weak credentials or exposing the service + publicly without required network controls. +- A compromised administrator using their legitimate administrative privileges. +- A compromised database, object store, Kubernetes cluster, host operating + system, identity provider, or external policy decision point. +- Lack of rate limiting unless it enables a concrete security impact beyond + resource exhaustion. +- Information visible to a principal that is explicitly authorized to access it. +- Storage-provider limitations that are accurately documented and do not give + Polaris callers more access than the deployment intentionally configured. +- Test-only code, local development defaults, or example configuration not used + in production paths, unless clearly reachable in production builds. + +Polaris does not, by itself, provide the following security properties: + +- Protection after compromise of deployment infrastructure, persistence + backends, object stores, identity providers, external policy decision points, + Kubernetes clusters, hosts, or administrator workstations. +- Protection against authorized administrators using their legitimate privileges + in ways the deployment does not intend. +- Network isolation, TLS termination, firewalling, ingress policy, Kubernetes + policy, or cloud-account isolation beyond what the deployment provides. +- Secret management for values that operators place outside Polaris-controlled + secret storage or configuration mechanisms. +- Stronger delegated-credential isolation than the selected storage provider, + configured policy model, and documented deployment mode can support. +- Safety of arbitrary downstream packaging, module combinations, integrations, + or local patches that are not part of an official Polaris artifact or + documented supported configuration. + +Deployment operators and downstream integrators are responsible for: + +- Protecting service credentials, signing keys, client secrets, object-store + credentials, identity-provider configuration, policy-service credentials, and + local CLI or admin-tool profiles. +- Configuring TLS, network exposure, ingress, firewalls, service accounts, + Kubernetes permissions, cloud IAM, object-store policy, and persistence-backend + access according to the deployment's security requirements. +- Choosing production-appropriate authentication, authorization, persistence, + storage, and policy-service settings before exposing Polaris to untrusted + clients. +- Avoiding production use of local-development defaults, sample credentials, + test fixtures, example configuration, and benchmark-only tooling unless those + choices are explicitly intended and protected. +- Rotating and revoking credentials, tokens, secrets, and delegated access + according to the deployment's operational requirements. +- Protecting logs, generated configuration, local profiles, backups, database + snapshots, object-store metadata, and other artifacts that may contain + security-sensitive values. + +The following patterns often require careful triage because they can be +vulnerabilities, documentation hardening items, deployment responsibilities, or +false positives depending on the actor, configuration, and reachable path: + +- Treating client-visible properties, catalog metadata, generated configuration, + local profiles, command output, browser storage, or logs as secret stores. +- Assuming delegated storage credentials are narrower than the configured + provider policy, access-boundary mechanism, or documented deployment mode + actually supports. +- Assuming every custom build, Gradle project combination, optional dependency, + extension, or downstream package is an official supported Polaris artifact. +- Applying server threat-model assumptions unchanged to CLIs, web UIs, + migration tools, MCP servers, synchronization tools, benchmarks, or other + related tools. +- Treating test fixtures, mocked trust decisions, direct internal object + construction, or already-authorized access as proof of a production + vulnerability. +- Treating dependency advisories as Polaris vulnerabilities without showing that + the vulnerable behavior is present, reachable, and crosses a Polaris trust + boundary. + +Known non-findings are cases already covered by the non-issues, dependency, +variant, distribution, and related-tool sections above. Common examples include +authorized access, unreachable dependency advisories, test-only behavior, +compromised deployment infrastructure, and impossible variant combinations. + +## Consumption And Distribution Boundaries + +Users consume Polaris in several ways, including: + +- Release tarballs containing the runnable Quarkus applications for the Polaris + server and Polaris admin tool. +- Container images for the Polaris server and Polaris admin tool. +- Helm charts for deploying the Polaris server. +- The Python-based CLI tool under `client/`. +- Custom applications or distributions built from Polaris modules such as + `polaris-core`, `polaris-runtime-service`, extension modules, or other Gradle + project combinations. + +The Polaris threat model applies directly to artifacts and modules produced by +the Apache Polaris project. It also applies to custom distributions when they use +Polaris code to make authentication, authorization, catalog metadata, policy, or +credential-handling decisions. + +Custom distributions can change the effective threat model. When evaluating a +finding, determine whether the behavior is present in: + +- An official Polaris release artifact. +- An official Polaris container image. +- An official Polaris Helm chart. +- The Polaris CLI. +- A reusable Polaris module. Review Comment: Does this mean every production module we push to maven? (Exclude tests + docs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
