pjanuario opened a new issue, #3685: URL: https://github.com/apache/polaris/issues/3685
### Describe the bug ### **1. Issue Summary** We are experiencing a critical performance degradation where the `create_namespace` API operation consistently times out (>30s), causing cascading failures in our downstream workload infrastructure (Traefik 504 Gateway Timeouts and pyiceberg have an unexpected behaviour). ### **2. Remediation Status** To restore production stability, we are manually applying the following schema changes: * **Applied:** `idx_entities_catalog_id_id` on `entities`. (Result: Latency dropped from ~36s to ~17s). * **Applying Now:** `idx_grants_realm_grantee` and `idx_grants_realm_securable` on `grant_records` to resolve 100% CPU usage during permission checks and GC. ### **3. Root Cause Analysis (Database Layer)** Our investigation confirmed that the default schema lacks secondary indexes required for the application's actual query patterns. **A. The "Grant Records" Bottleneck (High Frequency)** The application performs thousands of permission checks per operation. * **Observation:** 100% Sequential Scans. * **Metrics:** 65,000+ executions of `SELECT ... WHERE grantee_id = ...`. * **Impact:** Without an index starting with `grantee_id` or `realm_id`, the database reads **13.7 Billion rows** to serve these requests. * **Fix:** We are adding indexes to support lookups by `grantee_id` and `securable_id`. **B. The "Entities" Bottleneck (Bulk Fetch)** * **Observation:** Namespace create operation triggers a query with a massive `IN` clause containing ~7,800 tuples: `WHERE (catalog_id, id) IN ((?,?), (?,?)...)`. * **Impact:** The query planner could not use the PK (starting with `realm_id`) efficiently, resulting in 17s+ execution time. * **Fix:** We added `idx_entities_catalog_id_id` to optimize this lookup. ### **4. Request for Engineering: Query Optimization** While the indexes mitigate the immediate outage, the underlying query patterns appear inefficient and likely need refactoring in the Polaris codebase. **1. Review "N+1" Permission Checks** We observed **65,415 executions** of the permission check query during a short window. * *Question:* Is the application iterating through a list and checking permissions one-by-one instead of performing a bulk permission check? This volume suggests an "N+1" query issue that indexes can only partially band-aid. **2. Review Bulk Entity Fetch Strategy** The query `SELECT ... FROM entities WHERE (catalog_id, id) IN (...)` is passing **~15,000 bind parameters** (7,800 pairs). * *Recommendation:* This pattern scales poorly. Please consider paginating this operation or refactoring the logic to avoid passing thousands of IDs in a single SQL statement. ### **5. SQL for Upstream Merge** We strongly recommend including these indexes in the next Polaris release to prevent similar degradation for other users: ```sql -- 1. Optimizes High-Volume Permission Checks (Fixes API CPU load) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_grantee ON polaris_schema.grant_records (realm_id, grantee_id); -- 2. Optimizes Garbage Collection (Fixes background job hangs) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_securable ON polaris_schema.grant_records (realm_id, securable_id); -- 3. Optimizes Bulk Entity Lookups (Fixes Namespace listing latency) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_entities_catalog_id_id ON polaris_schema.entities (catalog_id, id); ``` ### To Reproduce 1. **Environment:** Deploy Apache Polaris backed by PostgreSQL with a moderate dataset (~170k grant records, ~160k entities). 2. **Action:** Execute the `create_namespace` API operation on a catalog with ~22k namespaces. 3. **Conditions:** Ensure standard concurrency; the issue manifests even without much load due to table scans. 4. **Observation:** Monitor API response time and database activity via `pg_stat_statements`. ### Actual Behavior * **Performance:** The operation consistently takes **33s - 36s** to complete. * **Database Execution:** Permission checks trigger a **Sequential Scan** of the entire `grant_records` table for every single lookup. We observed queries reading **~174,000 rows per execution** (100% of the table) thousands of times per minute. * **Client Experience:** The API request times out. Proxies (e.g., Traefik) return `504 Gateway Timeout` at the 30s mark, and client applications (crawlers) hang or fail silently. ### Expected Behavior * **Performance:** The `create_namespace` operation should complete in **sub-second time to ~2 seconds**. * **Database Execution:** Permission checks filtering by `grantee_id` should utilize an **Index Scan**, reading only the specific rows relevant to the user (typically <100 rows). * **Client Experience:** The request should return `200 OK` well within standard proxy/client timeout windows (30s). ### Additional context All this information was very consistent along the research we did, after applying the first index `idx_entities_catalog_id_id` we seen a consistent drop on the response time from ~31s into ~16s. After the index was applied the stats were reset and and pg statements cleaned, we kept the system running for +-24h and the data we see in the ticket is from this period after the resets. ### System information OS: Linux Polaris Catalog Version: Docker image 1.3.0 (initial symptoms were present in 1.0.1 and we upgraded to validate if this was caused by the version, the problem persisted) Object storage & setup: AWS S3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
