pjanuario opened a new issue, #3685:
URL: https://github.com/apache/polaris/issues/3685

   ### Describe the bug
   
   ### **1. Issue Summary**
   
   We are experiencing a critical performance degradation where the 
`create_namespace` API operation consistently times out (>30s), causing 
cascading failures in our downstream workload infrastructure (Traefik 504 
Gateway Timeouts and pyiceberg have an unexpected behaviour).
   
   ### **2. Remediation Status**
   
   To restore production stability, we are manually applying the following 
schema changes:
   
   * **Applied:** `idx_entities_catalog_id_id` on `entities`. (Result: Latency 
dropped from ~36s to ~17s).
   * **Applying Now:** `idx_grants_realm_grantee` and 
`idx_grants_realm_securable` on `grant_records` to resolve 100% CPU usage 
during permission checks and GC.
   
   ### **3. Root Cause Analysis (Database Layer)**
   
   Our investigation confirmed that the default schema lacks secondary indexes 
required for the application's actual query patterns.
   
   **A. The "Grant Records" Bottleneck (High Frequency)**
   The application performs thousands of permission checks per operation.
   
   * **Observation:** 100% Sequential Scans.
   * **Metrics:** 65,000+ executions of `SELECT ... WHERE grantee_id = ...`.
   * **Impact:** Without an index starting with `grantee_id` or `realm_id`, the 
database reads **13.7 Billion rows** to serve these requests.
   * **Fix:** We are adding indexes to support lookups by `grantee_id` and 
`securable_id`.
   
   **B. The "Entities" Bottleneck (Bulk Fetch)**
   
   * **Observation:** Namespace create operation triggers a query with a 
massive `IN` clause containing ~7,800 tuples: `WHERE (catalog_id, id) IN 
((?,?), (?,?)...)`.
   * **Impact:** The query planner could not use the PK (starting with 
`realm_id`) efficiently, resulting in 17s+ execution time.
   * **Fix:** We added `idx_entities_catalog_id_id` to optimize this lookup.
   
   ### **4. Request for Engineering: Query Optimization**
   
   While the indexes mitigate the immediate outage, the underlying query 
patterns appear inefficient and likely need refactoring in the Polaris codebase.
   
   **1. Review "N+1" Permission Checks**
   We observed **65,415 executions** of the permission check query during a 
short window.
   
   * *Question:* Is the application iterating through a list and checking 
permissions one-by-one instead of performing a bulk permission check? This 
volume suggests an "N+1" query issue that indexes can only partially band-aid.
   
   **2. Review Bulk Entity Fetch Strategy**
   The query `SELECT ... FROM entities WHERE (catalog_id, id) IN (...)` is 
passing **~15,000 bind parameters** (7,800 pairs).
   
   * *Recommendation:* This pattern scales poorly. Please consider paginating 
this operation or refactoring the logic to avoid passing thousands of IDs in a 
single SQL statement.
   
   ### **5. SQL for Upstream Merge**
   
   We strongly recommend including these indexes in the next Polaris release to 
prevent similar degradation for other users:
   
   ```sql
   -- 1. Optimizes High-Volume Permission Checks (Fixes API CPU load)
   CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_grantee 
   ON polaris_schema.grant_records (realm_id, grantee_id);
   
   -- 2. Optimizes Garbage Collection (Fixes background job hangs)
   CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_securable 
   ON polaris_schema.grant_records (realm_id, securable_id);
   
   -- 3. Optimizes Bulk Entity Lookups (Fixes Namespace listing latency)
   CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_entities_catalog_id_id 
   ON polaris_schema.entities (catalog_id, id);
   
   ```
   
   ### To Reproduce
   
   1. **Environment:** Deploy Apache Polaris backed by PostgreSQL with a 
moderate dataset (~170k grant records, ~160k entities).
   2. **Action:** Execute the `create_namespace` API operation on a catalog 
with ~22k namespaces.
   3. **Conditions:** Ensure standard concurrency; the issue manifests even 
without much load due to table scans.
   4. **Observation:** Monitor API response time and database activity via 
`pg_stat_statements`.
   
   
   ### Actual Behavior
   
   * **Performance:** The operation consistently takes **33s - 36s** to 
complete.
   * **Database Execution:** Permission checks trigger a **Sequential Scan** of 
the entire `grant_records` table for every single lookup. We observed queries 
reading **~174,000 rows per execution** (100% of the table) thousands of times 
per minute.
   * **Client Experience:** The API request times out. Proxies (e.g., Traefik) 
return `504 Gateway Timeout` at the 30s mark, and client applications 
(crawlers) hang or fail silently.
   
   ### Expected Behavior
   
   * **Performance:** The `create_namespace` operation should complete in 
**sub-second time to ~2 seconds**.
   * **Database Execution:** Permission checks filtering by `grantee_id` should 
utilize an **Index Scan**, reading only the specific rows relevant to the user 
(typically <100 rows).
   * **Client Experience:** The request should return `200 OK` well within 
standard proxy/client timeout windows (30s).
   
   ### Additional context
   
   All this information was very consistent along the research we did, after 
applying the first index `idx_entities_catalog_id_id` we seen a consistent drop 
on the response time from ~31s into ~16s.
   After the index was applied the stats were reset and and pg statements 
cleaned, we kept the system running for +-24h and the data we see in the ticket 
is from this period after the resets.
   
   ### System information
   
   OS: Linux
   Polaris Catalog Version: Docker image 1.3.0 (initial symptoms were present 
in 1.0.1 and we upgraded to validate if this was caused by the version, the 
problem persisted)
   Object storage & setup: AWS S3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to