codeant-ai-for-open-source[bot] commented on code in PR #41055:
URL: https://github.com/apache/superset/pull/41055#discussion_r3414536056
##########
superset/db_engine_specs/trino.py:
##########
@@ -631,3 +680,34 @@ def get_indexes(
return super().get_indexes(database, inspector, table)
except NoSuchTableError:
return []
+
+ @classmethod
+ def latest_partition(
+ cls,
+ database: Database,
+ table: Table,
+ show_first: bool = False,
+ indexes: list[dict[str, Any]] | None = None,
+ ) -> tuple[list[str], list[str] | None]:
+ """
+ Return the latest partition for a table.
+
+ Iceberg "$partitions" metadata fields are filtered out first, so we
+ never build a latest-partition query against them.
+
+ :param database: the database the query will be run against
+ :param table: the table instance
+ :param show_first: return the value for the first partitioning key when
+ there are several
+ :param indexes: the indexes associated with the table
+ :returns: the column names and the latest partition values
+ """
+ if indexes is None:
+ indexes = database.get_indexes(table)
+
+ return super().latest_partition(
+ database,
+ table,
+ show_first=show_first,
+ indexes=cls._filter_iceberg_partition_indexes(indexes),
+ )
Review Comment:
**Suggestion:** This override performs `database.get_indexes(table)` before
calling the memoized parent implementation, so callers that omit `indexes` (for
example Jinja `latest_partition()`/`where_latest_partition()` paths) will hit
metadata introspection on every call even when the partition result is cached.
That regresses the cache behavior and adds unnecessary round-trips. Keep the
override memoized (or otherwise move index lookup/filtering into the cached
path) so repeated calls do not re-fetch indexes each time. [cache]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Trino table metadata API always re-runs index introspection.
- ⚠️ Jinja `latest_partition` calls on Trino lose index caching.
- ⚠️ Extra Trino metadata queries can slow dashboards using partitions.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open a Trino-backed dataset's metadata/preview in the UI, which uses
`database.select_star` with `latest_partition=True` as implemented in
`superset/databases/utils.py:100-27` (the `"selectStar"` field calls
`database.select_star(..., latest_partition=True)` for the inspected table).
2. That call is routed to `BaseEngineSpec.select_star` in
`superset/db_engine_specs/base.py:51-102`, which for `latest_partition=True`
invokes
`cls.where_latest_partition(database, table, qry, columns=cols)`, and for
Trino databases
`cls` is `TrinoEngineSpec`.
3. `PrestoBaseEngineSpec.where_latest_partition` at
`superset/db_engine_specs/presto.py:13-48` calls
`cls.latest_partition(database, table,
show_first=True)`; for Trino this resolves to
`TrinoEngineSpec.latest_partition` defined
at `superset/db_engine_specs/trino.py:685-713`, which first executes `if
indexes is None:
indexes = database.get_indexes(table)` (line 705-706).
4. `TrinoEngineSpec.latest_partition` then delegates to the memoized parent
`PrestoBaseEngineSpec.latest_partition` at
`superset/db_engine_specs/presto.py:57-78`, but
because `database.get_indexes(table)` is executed in the subclass *before*
the memoized
call, every UI refresh or repeated Jinja macro call (see
`superset/jinja_context.py:37-43`
using `latest_partition`) on the same Trino table re-triggers expensive
`database.get_indexes(table)` introspection even when the parent
`latest_partition` result
is served from cache, regressing the original caching behavior where index
lookup occurred
inside the memoized function body.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=dd4e6541cf184a708d25315aeb4d2d6c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=dd4e6541cf184a708d25315aeb4d2d6c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/trino.py
**Line:** 705:713
**Comment:**
*Cache: This override performs `database.get_indexes(table)` before
calling the memoized parent implementation, so callers that omit `indexes` (for
example Jinja `latest_partition()`/`where_latest_partition()` paths) will hit
metadata introspection on every call even when the partition result is cached.
That regresses the cache behavior and adds unnecessary round-trips. Keep the
override memoized (or otherwise move index lookup/filtering into the cached
path) so repeated calls do not re-fetch indexes each time.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=142368071144f1646cbb73b51532219a8d06e282d09638097f12830fa268b5ca&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=142368071144f1646cbb73b51532219a8d06e282d09638097f12830fa268b5ca&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]