Copilot commented on code in PR #40839:
URL: https://github.com/apache/superset/pull/40839#discussion_r3382571841


##########
superset/datasource/api.py:
##########
@@ -124,13 +131,59 @@ def get_column_values(
 
         row_limit = apply_max_row_limit(app.config["FILTER_SELECT_ROW_LIMIT"])
         denormalize_column = not datasource.normalize_columns
+
+        # Cache distinct column-value results so a dashboard with many filters
+        # backed by the same (often heavy) virtual dataset doesn't re-execute
+        # the wrapping query per filter (#39342).
+        #
+        # Key fields:
+        # - ``user`` — baseline per-user isolation. ``get_user_id()`` returns
+        #   ``None`` for guest/anonymous sessions, so this field alone is not
+        #   sufficient for RLS safety.
+        # - ``rls`` — full RLS fingerprint via
+        #   ``security_manager.get_rls_cache_key`` (the canonical helper used
+        #   by viz.py and query_context_processor.py). Covers both regular
+        #   RLS policy changes and guest-token RLS, so embedded sessions
+        #   with different guest tokens never share cached values even when
+        #   ``get_user_id()`` is ``None`` for both.
+        # - ``changed_on`` — auto-busts cached entries when the dataset's
+        #   underlying SQL is edited.
+        force = parse_boolean_string(request.args.get("force"))
+        cache_key = (
+            "col_values:"
+            + hashlib.sha256(
+                json.dumps(
+                    {
+                        "uid": datasource.uid,
+                        "col": column_name,
+                        "limit": row_limit,
+                        "denorm": denormalize_column,
+                        "user": get_user_id(),
+                        "rls": security_manager.get_rls_cache_key(datasource),
+                        "changed_on": str(getattr(datasource, "changed_on", 
"")),

Review Comment:
   The cache key includes both `get_user_id()` and the full RLS fingerprint. 
With `user` included, two different authenticated users will never share a 
cached entry even if their effective RLS context is identical. This contradicts 
the PR description claim that “Two users on the same dashboard with same RLS → 
second user gets cached values.” Either adjust the PR description/expectations, 
or (if safe for your RLS model) drop per-user isolation and rely on the RLS 
fingerprint for cache partitioning.



##########
tests/integration_tests/datasource/api_tests.py:
##########
@@ -23,6 +24,7 @@
 from superset import db, security_manager
 from superset.connectors.sqla.models import SqlaTable
 from superset.daos.exceptions import DatasourceTypeNotSupportedError
+from superset.extensions import cache_manager
 from superset.utils import json

Review Comment:
   Because `get_column_values` now caches responses in 
`cache_manager.data_cache` (Redis in integration tests), earlier tests in this 
module can populate the cache and cause later tests (that patch/assert 
`values_for_column` was called) to unexpectedly hit the cache and skip the 
patched method. Clearing `data_cache` in a per-test `setUp`/`tearDown` keeps 
these tests isolated and avoids order-dependent/flaky behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to