Copilot commented on code in PR #38227:
URL: https://github.com/apache/superset/pull/38227#discussion_r2885468869
##########
tests/unit_tests/queries/query_object_test.py:
##########
@@ -386,3 +387,127 @@ def
test_cache_key_cache_impersonation_on_with_different_user_and_db_impersonati
],
any_order=True,
)
+
+
+def test_cache_key_normalization_metrics_crlf():
Review Comment:
The PR description's "Testing Instructions" section instructs running
`pytest tests/unit_tests/queries/test_query_object_cache_key_normalization.py`,
but this file does not exist. The tests were actually added to
`tests/unit_tests/queries/query_object_test.py`. This discrepancy between the
PR description and the actual code changes will cause confusion for reviewers
or contributors trying to run the specified tests.
##########
superset/common/query_object.py:
##########
@@ -420,10 +421,39 @@ def cache_key(self, **extra: Any) -> str: # noqa: C901
the use-provided inputs to bounds, which may be time-relative (as in
"5 days ago" or "now").
"""
- # Cast to dict[str, Any] for mutation operations
- cache_dict: dict[str, Any] = dict(self.to_dict())
+ # Use deepcopy to prevent mutation of original objects (nested
metrics/columns)
+ cache_dict: dict[str, Any] = copy.deepcopy(self.to_dict()) # type:
ignore[arg-type]
cache_dict.update(extra)
+ def _normalize_sql(value: Any) -> Any:
+ if isinstance(value, str):
+ return value.replace("\r\n", "\n").strip()
+ return value
+
+ # Normalize SQL expressions to ensure deterministic cache keys
+ for metric in cache_dict.get("metrics") or []:
+ if isinstance(metric, dict) and metric.get("expressionType") ==
"SQL":
+ metric["sqlExpression"] =
_normalize_sql(metric.get("sqlExpression"))
+
+ for column in cache_dict.get("columns") or []:
+ if isinstance(column, dict) and column.get("expressionType") ==
"SQL":
+ column["sqlExpression"] =
_normalize_sql(column.get("sqlExpression"))
+
+ for order in cache_dict.get("orderby") or []:
+ if (
+ isinstance(order, (list, tuple))
+ and len(order) > 0
+ and isinstance(order[0], dict)
+ and order[0].get("expressionType") == "SQL"
+ ):
+ order[0]["sqlExpression"] = _normalize_sql(
+ order[0].get("sqlExpression")
+ )
+
Review Comment:
The `series_limit_metric` field included in `to_dict()` is of type `Metric`
(`AdhocMetric | str`), and an `AdhocMetric` can have `expressionType == "SQL"`
with a `sqlExpression` containing CRLF or inconsistent whitespace. The new
normalization loop only handles `metrics`, `columns`, and `orderby`, but omits
`series_limit_metric`. This means that if a user sets `series_limit_metric` to
an ad-hoc SQL metric with CRLF line endings, the cache key mismatch issue
described in the bug report could still occur for that field.
```suggestion
series_limit_metric = cache_dict.get("series_limit_metric")
if (
isinstance(series_limit_metric, dict)
and series_limit_metric.get("expressionType") == "SQL"
):
series_limit_metric["sqlExpression"] = _normalize_sql(
series_limit_metric.get("sqlExpression")
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]