Re: [PR] fix: prevent cache key mismatch [superset]

via GitHub Mon, 20 Oct 2025 08:22:37 -0700


korbit-ai[bot] commented on code in PR #35573:
URL: https://github.com/apache/superset/pull/35573#discussion_r2445340105



##########
superset/common/query_object.py:
##########
@@ -359,6 +360,104 @@ def _sanitize_filters(self) -> None:
                 except QueryClauseValidationException as ex:
                     raise QueryObjectValidationError(ex.message) from ex
 
+    def _sanitize_sql_expressions(self) -> None:
+        """
+        Sanitize SQL expressions in adhoc metrics and orderby for consistent 
cache keys.
+
+        This processes SQL expressions before cache key generation, preventing 
cache
+        mismatches due to later processing during query execution.
+        """
+        if not self.datasource or not hasattr(
+            self.datasource,
+            "_process_sql_expression",
+        ):
+            return
+
+        # Process adhoc metrics
+        if self.metrics:
+            self._sanitize_metrics_expressions()
+
+        # Process orderby - these may contain adhoc metrics
+        if self.orderby:
+            self._sanitize_orderby_expressions()
+
+    def _sanitize_metrics_expressions(self) -> None:
+        """
+        Process SQL expressions in adhoc metrics.
+        Creates new metric dictionaries to avoid mutating shared references.
+        """
+        # datasource is checked in parent method, assert for type checking
+        assert self.datasource is not None
+
+        if not self.metrics:
+            return
+
+        sanitized_metrics = []
+        for metric in self.metrics:
+            if not (is_adhoc_metric(metric) and isinstance(metric, dict)):
+                sanitized_metrics.append(metric)
+                continue
+            if sql_expr := metric.get("sqlExpression"):
+                try:
+                    processed = self.datasource._process_select_expression(
+                        expression=sql_expr,
+                        database_id=self.datasource.database_id,
+                        engine=self.datasource.database.backend,
+                        schema=self.datasource.schema,
+                        template_processor=None,
+                    )
+                    if processed and processed != sql_expr:
+                        # Create new dict to avoid mutating shared references
+                        sanitized_metrics.append({**metric, "sqlExpression": 
processed})
+                    else:
+                        sanitized_metrics.append(metric)
+                except Exception as ex:  # pylint: disable=broad-except
+                    # If processing fails, leave as-is and let execution 
handle it
+                    logger.debug("Failed to sanitize metric SQL expression: 
%s", ex)
+                    sanitized_metrics.append(metric)
+            else:
+                sanitized_metrics.append(metric)

Review Comment:
   ### Inefficient metric processing without caching <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The method creates a new list and appends items one by one instead of using 
list comprehension, and performs expensive SQL processing operations for each 
metric without any caching mechanism.
   
   
   ###### Why this matters
   This approach is inefficient for large metric lists and may cause repeated 
expensive SQL processing operations. The lack of caching means identical SQL 
expressions will be processed multiple times across different query objects.
   
   ###### Suggested change ∙ *Feature Preview*
   Use list comprehension where possible and implement caching for processed 
SQL expressions. Consider using `functools.lru_cache` or a class-level cache to 
store processed expressions:
   
   ```python
   from functools import lru_cache
   
   @lru_cache(maxsize=128)
   def _cached_process_select_expression(self, sql_expr, database_id, engine, 
schema):
       return self.datasource._process_select_expression(
           expression=sql_expr,
           database_id=database_id,
           engine=engine,
           schema=schema,
           template_processor=None,
       )
   ```
   
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5c265a0d-ac46-4508-888a-907e07e56e40/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5c265a0d-ac46-4508-888a-907e07e56e40?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5c265a0d-ac46-4508-888a-907e07e56e40?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5c265a0d-ac46-4508-888a-907e07e56e40?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5c265a0d-ac46-4508-888a-907e07e56e40)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:e7175c76-f3ca-4ccd-b96d-53f71d8bf15a -->
   
   
   [](e7175c76-f3ca-4ccd-b96d-53f71d8bf15a)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: prevent cache key mismatch [superset]

Reply via email to