codeant-ai-for-open-source[bot] commented on code in PR #41055:
URL: https://github.com/apache/superset/pull/41055#discussion_r3454046199


##########
superset/db_engine_specs/trino.py:
##########
@@ -631,3 +680,66 @@ def get_indexes(
             return super().get_indexes(database, inspector, table)
         except NoSuchTableError:
             return []
+
+    @classmethod
+    @cache_manager.data_cache.memoize(timeout=60)
+    def latest_partition(
+        cls,
+        database: Database,
+        table: Table,
+        show_first: bool = False,
+        indexes: list[dict[str, Any]] | None = None,

Review Comment:
   **Suggestion:** This method is memoized in the global data cache even though 
Trino supports OAuth2 personal tokens, so cached latest-partition results can 
be reused across different users hitting the same database/table and expose 
user-scoped metadata incorrectly. Use a per-user cache key (or bypass shared 
memoization for OAuth2 connections) so partition metadata is isolated by user 
identity. [cache]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Trino Jinja `latest_partition` macro reuses partitions across users.
   - ❌ Trino `where_latest_partition` filters share first user's partitions.
   - ⚠️ Database `select_star` metadata can reflect another user's view.
   - ⚠️ Violates per-user caching expectations for OAuth2 Trino connections.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a Trino database with OAuth2 personal tokens so
   `Database.is_oauth2_enabled()` is true and user-scoped tokens are used when 
creating
   engines via `Database._get_sqla_engine` (see 
`superset/models/core.py:116-123`, which
   calls `get_oauth2_access_token(...)` and passes the token into
   `db_engine_spec.impersonate_user`).
   
   2. From SQL Lab or a chart, run a query against this Trino database that 
uses the Jinja
   partition helpers, for example `{{ latest_partition('schema.table') }}`, 
which is
   implemented in `superset/jinja_context.py:996-1033` as
   `latest_partitions()`/`first_latest_partition()` and calls
   `self._database.db_engine_spec.latest_partition(database=self._database,
   table=Table(table_name, schema))`.
   
   3. For a Trino-backed database, `self._database.db_engine_spec` is 
`TrinoEngineSpec`,
   whose `latest_partition` override at 
`superset/db_engine_specs/trino.py:684-693` is
   decorated with `@cache_manager.data_cache.memoize(timeout=60)` using the 
global
   `data_cache` from `superset/utils/cache_manager.py:187-214`. The first user 
(user A) to
   hit this macro causes `TrinoEngineSpec.latest_partition` to execute, which 
delegates to
   `PrestoBaseEngineSpec.latest_partition` in 
`superset/db_engine_specs/presto.py:568-647` to
   run a SQL query via `database.get_df(...)` under user A's OAuth2 token, and 
the
   `(col_names, latest_parts)` tuple is stored in `data_cache` keyed only on 
the method
   arguments `(cls, database, table, show_first, indexes)`.
   
   4. Within the 60-second memoization window, a second user (user B) with 
different
   permissions on the same Trino database/table issues a query that triggers 
the same
   `latest_partition` call (same `database`/`table` arguments) via either the 
same Jinja
   macro (`superset/jinja_context.py:996-1033`) or `where_latest_partition` 
used by
   `select_star` (see `superset/db_engine_specs/presto.py:520-559` calling
   `cls.latest_partition` and `superset/db_engine_specs/base.py:41-97`), and 
the decorator at
   `superset/db_engine_specs/trino.py:684-691` returns the cached result from 
user A without
   re-executing `database.get_df` under user B's token; because `data_cache` 
keys do not
   incorporate any impersonation key or `g.user`, user B receives partition 
metadata computed
   under user A's OAuth2-scoped privileges, leaking user-scoped metadata across 
users.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6b852786d6194fc6897b1965ec6c9975&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6b852786d6194fc6897b1965ec6c9975&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/trino.py
   **Line:** 684:691
   **Comment:**
        *Cache: This method is memoized in the global data cache even though 
Trino supports OAuth2 personal tokens, so cached latest-partition results can 
be reused across different users hitting the same database/table and expose 
user-scoped metadata incorrectly. Use a per-user cache key (or bypass shared 
memoization for OAuth2 connections) so partition metadata is isolated by user 
identity.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=f7e546d13d5950c974d9cfed29563b04b1957a8397f383800e6053009163e4af&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41055&comment_hash=f7e546d13d5950c974d9cfed29563b04b1957a8397f383800e6053009163e4af&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to