Re: [PR] fix(trino): ignore Iceberg $partitions metadata fields in partition detection [superset]

via GitHub Mon, 15 Jun 2026 08:12:53 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41055:
URL: https://github.com/apache/superset/pull/41055#discussion_r3414536056



##########
superset/db_engine_specs/trino.py:
##########
@@ -631,3 +680,34 @@ def get_indexes(
             return super().get_indexes(database, inspector, table)
         except NoSuchTableError:
             return []
+
+    @classmethod
+    def latest_partition(
+        cls,
+        database: Database,
+        table: Table,
+        show_first: bool = False,
+        indexes: list[dict[str, Any]] | None = None,
+    ) -> tuple[list[str], list[str] | None]:
+        """
+        Return the latest partition for a table.
+
+        Iceberg "$partitions" metadata fields are filtered out first, so we
+        never build a latest-partition query against them.
+
+        :param database: the database the query will be run against
+        :param table: the table instance
+        :param show_first: return the value for the first partitioning key when
+            there are several
+        :param indexes: the indexes associated with the table
+        :returns: the column names and the latest partition values
+        """
+        if indexes is None:
+            indexes = database.get_indexes(table)
+
+        return super().latest_partition(
+            database,
+            table,
+            show_first=show_first,
+            indexes=cls._filter_iceberg_partition_indexes(indexes),
+        )

Review Comment:
   **Suggestion:** This override performs `database.get_indexes(table)` before 
calling the memoized parent implementation, so callers that omit `indexes` (for 
example Jinja `latest_partition()`/`where_latest_partition()` paths) will hit 
metadata introspection on every call even when the partition result is cached. 
That regresses the cache behavior and adds unnecessary round-trips. Keep the 
override memoized (or otherwise move index lookup/filtering into the cached 
path) so repeated calls do not re-fetch indexes each time. [cache]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Trino table metadata API always re-runs index introspection.
   - ⚠️ Jinja `latest_partition` calls on Trino lose index caching.
   - ⚠️ Extra Trino metadata queries can slow dashboards using partitions.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open a Trino-backed dataset's metadata/preview in the UI, which uses
   `database.select_star` with `latest_partition=True` as implemented in
   `superset/databases/utils.py:100-27` (the `"selectStar"` field calls
   `database.select_star(..., latest_partition=True)` for the inspected table).
   
   2. That call is routed to `BaseEngineSpec.select_star` in
   `superset/db_engine_specs/base.py:51-102`, which for `latest_partition=True` 
invokes
   `cls.where_latest_partition(database, table, qry, columns=cols)`, and for 
Trino databases
   `cls` is `TrinoEngineSpec`.
   
   3. `PrestoBaseEngineSpec.where_latest_partition` at
   `superset/db_engine_specs/presto.py:13-48` calls 
`cls.latest_partition(database, table,
   show_first=True)`; for Trino this resolves to 
`TrinoEngineSpec.latest_partition` defined
   at `superset/db_engine_specs/trino.py:685-713`, which first executes `if 
indexes is None:
   indexes = database.get_indexes(table)` (line 705-706).
   
   4. `TrinoEngineSpec.latest_partition` then delegates to the memoized parent
   `PrestoBaseEngineSpec.latest_partition` at 
`superset/db_engine_specs/presto.py:57-78`, but
   because `database.get_indexes(table)` is executed in the subclass *before* 
the memoized
   call, every UI refresh or repeated Jinja macro call (see 
`superset/jinja_context.py:37-43`
   using `latest_partition`) on the same Trino table re-triggers expensive
   `database.get_indexes(table)` introspection even when the parent 
`latest_partition` result
   is served from cache, regressing the original caching behavior where index 
lookup occurred
   inside the memoized function body.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=dd4e6541cf184a708d25315aeb4d2d6c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=dd4e6541cf184a708d25315aeb4d2d6c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/trino.py
   **Line:** 705:713
   **Comment:**
        *Cache: This override performs `database.get_indexes(table)` before 
calling the memoized parent implementation, so callers that omit `indexes` (for 
example Jinja `latest_partition()`/`where_latest_partition()` paths) will hit 
metadata introspection on every call even when the partition result is cached. 
That regresses the cache behavior and adds unnecessary round-trips. Keep the 
override memoized (or otherwise move index lookup/filtering into the cached 
path) so repeated calls do not re-fetch indexes each time.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=142368071144f1646cbb73b51532219a8d06e282d09638097f12830fa268b5ca&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=142368071144f1646cbb73b51532219a8d06e282d09638097f12830fa268b5ca&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(trino): ignore Iceberg $partitions metadata fields in partition detection [superset]

Reply via email to