codeant-ai-for-open-source[bot] commented on code in PR #37516:
URL: https://github.com/apache/superset/pull/37516#discussion_r3448765720
##########
superset/config.py:
##########
@@ -1206,6 +1206,16 @@ class D3TimeFormat(TypedDict, total=False):
# Cache for datasource metadata and query results
DATA_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "NullCache"}
+# Include per-phase timing breakdown in /api/v1/chart/data responses.
+# Set to False to keep collecting stats_logger and slow-query timings while
+# omitting timing data from API responses.
+CHART_DATA_INCLUDE_TIMING: bool = True
Review Comment:
**Suggestion:** `CHART_DATA_INCLUDE_TIMING` is enabled by default, which
makes the new `timing` payload ship in every chart data response unless
operators explicitly disable it. This changes the API contract by default and
exposes internal execution-phase timing data unintentionally; set the default
to `False` so timing is opt-in as documented. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ /api/v1/chart/data now includes timing by default.
- ⚠️ Internal execution timing exposed in chart data responses.
- ⚠️ Third-party clients may break on new timing field.
- ⚠️ Async chart query result URLs also include timing payload.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start Superset with the default configuration so that the Flask app loads
settings from
`superset/config.py`, where `CHART_DATA_INCLUDE_TIMING: bool = True` is
defined at line
1212 (see `superset/config.py:1212`).
2. Issue a chart data request to the public API endpoint used throughout the
codebase: for
example, `POST /api/v1/chart/data` as exercised in
`tests/integration_tests/charts/data/api_tests.py` via the `CHART_DATA_URI =
"api/v1/chart/data"` constant at
`tests/integration_tests/charts/data/api_tests.py:75` and
used in `test_with_valid_qc__data_is_returned` at lines 244–252.
3. The request is handled by `ChartDataRestApi.data` in
`superset/charts/data/api.py:227-124`, which builds a `QueryContext`, runs
`ChartDataCommand`, and ultimately calls `_get_data_response`, which then
calls
`_send_chart_response` (same file).
4. Inside `_send_chart_response` in `superset/charts/data/api.py`, each
query result is
passed through `finalize_timing_payload` in the loop at
`superset/charts/data/api.py:495-496`; `finalize_timing_payload` (defined in
`superset/common/chart_data_timing.py:61-99`) attaches a `timing` dict to
each query
payload whenever `current_app.config.get("CHART_DATA_INCLUDE_TIMING")` is
truthy (line
98). Because the default in `superset/config.py` is `True`, every
`/api/v1/chart/data`
response now includes a `timing` object by default, changing the API
contract and exposing
internal per-phase execution timing unless operators explicitly override the
config.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=132b6eaa119547d3a8f9405c8cf2efea&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=132b6eaa119547d3a8f9405c8cf2efea&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/config.py
**Line:** 1212:1212
**Comment:**
*Logic Error: `CHART_DATA_INCLUDE_TIMING` is enabled by default, which
makes the new `timing` payload ship in every chart data response unless
operators explicitly disable it. This changes the API contract by default and
exposes internal execution-phase timing data unintentionally; set the default
to `False` so timing is opt-in as documented.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=cd748b016c14348c6f01d3a55858867988b2d9212ccdc4b162a18cb5972936c1&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=cd748b016c14348c6f01d3a55858867988b2d9212ccdc4b162a18cb5972936c1&reaction=dislike'>👎</a>
##########
tests/unit_tests/common/test_query_context_processor_timing.py:
##########
@@ -0,0 +1,470 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Any
+from unittest.mock import MagicMock, patch
+
+import pandas as pd
+import pytest
+
+from superset.commands.chart.data.get_data_command import ChartDataCommand
+from superset.commands.chart.exceptions import ChartDataQueryFailedError
+from superset.common.chart_data import ChartDataResultType
+from superset.common.chart_data_timing import (
+ finalize_timing_payload,
+ RESULT_PROCESSING_START_KEY,
+ TIMING_KEY,
+ TIMING_START_KEY,
+)
+from superset.common.db_query_status import QueryStatus
+from superset.common.query_actions import _get_full
+from superset.common.query_context_processor import QueryContextProcessor
+
+
[email protected]
+def mock_query_obj():
+ query_obj = MagicMock()
+ query_obj.columns = ["col1"]
+ query_obj.column_names = ["col1"]
+ query_obj.metrics = []
+ query_obj.metric_names = []
+ query_obj.from_dttm = None
+ query_obj.to_dttm = None
+ query_obj.annotation_layers = []
+ return query_obj
+
+
[email protected]
+def processor_with_cache():
+ """Create a processor with a mocked cache that returns a loaded result."""
+ mock_qc = MagicMock()
+ mock_qc.force = False
+
+ processor = QueryContextProcessor.__new__(QueryContextProcessor)
+ processor._query_context = mock_qc
+ processor._qc_datasource = MagicMock()
+ processor._qc_datasource.uid = "test_uid"
+ processor._qc_datasource.column_names = ["col1"]
+ return processor
+
+
+def finalize_payload(
+ payload: dict[str, Any],
+ include_timing: bool = True,
+ slow_query_threshold_ms: int | None = None,
+) -> None:
+ with patch("superset.common.chart_data_timing.current_app") as mock_app:
+ mock_app.config = {
+ "STATS_LOGGER": MagicMock(),
+ "CHART_DATA_INCLUDE_TIMING": include_timing,
+ "CHART_DATA_SLOW_QUERY_THRESHOLD_MS": slow_query_threshold_ms,
+ }
+ finalize_timing_payload(payload)
+
+
+def internal_timing() -> dict[str, Any]:
+ return {
+ TIMING_START_KEY: 1.0,
+ RESULT_PROCESSING_START_KEY: 2.0,
+ "validate_ms": 1.0,
+ "cache_lookup_ms": 2.0,
+ "db_execution_ms": None,
+ }
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+def test_timing_present_in_payload(
+ mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """Timing dict is included in finalized query result."""
+ cache = MagicMock()
+ cache.is_loaded = True
+ cache.is_cached = True
+ cache.df = pd.DataFrame({"col1": [1, 2]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 2
+ mock_cache_cls.get.return_value = cache
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ finalize_payload(result)
+ assert "timing" in result
+ assert TIMING_KEY not in result
+ timing = result["timing"]
+ assert "validate_ms" in timing
+ assert "cache_lookup_ms" in timing
+ assert "result_processing_ms" in timing
+ assert "total_ms" in timing
+ assert "is_cached" in timing
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+def test_timing_omitted_when_config_disabled(
+ mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """Timing dict is excluded from response when CHART_DATA_INCLUDE_TIMING is
False."""
+ cache = MagicMock()
+ cache.is_loaded = True
+ cache.is_cached = True
+ cache.df = pd.DataFrame({"col1": [1]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 1
+ mock_cache_cls.get.return_value = cache
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ finalize_payload(result, include_timing=False)
+ assert "timing" not in result
+ assert TIMING_KEY not in result
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+def test_timing_values_are_non_negative(
+ mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """All timing values are non-negative numbers."""
+ cache = MagicMock()
+ cache.is_loaded = True
+ cache.is_cached = True
+ cache.df = pd.DataFrame({"col1": [1]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 1
+ mock_cache_cls.get.return_value = cache
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ finalize_payload(result)
+ timing = result["timing"]
+ for key, value in timing.items():
+ if isinstance(value, (int, float)) and not isinstance(value, bool):
+ assert value >= 0, f"timing[{key!r}] should be >= 0, got {value}"
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+def test_timing_no_db_execution_on_cache_hit(
+ mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """db_execution_ms is None when the result is served from cache."""
+ cache = MagicMock()
+ cache.is_loaded = True
+ cache.is_cached = True
+ cache.df = pd.DataFrame({"col1": [1]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 1
+ mock_cache_cls.get.return_value = cache
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ finalize_payload(result)
+ assert result["timing"]["db_execution_ms"] is None
+ assert result["timing"]["is_cached"] is True
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+def test_timing_has_db_execution_on_cache_miss(
+ mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """db_execution_ms is present when the query is executed against the
database."""
+ cache = MagicMock()
+ cache.is_loaded = False
+ cache.is_cached = None
+ cache.df = pd.DataFrame({"col1": [1]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 1
+ mock_cache_cls.get.return_value = cache
+
+ processor_with_cache._qc_datasource.column_names = ["col1"]
+ processor_with_cache.get_query_result = MagicMock()
+ processor_with_cache.get_annotation_data = MagicMock(return_value={})
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ finalize_payload(result)
+ assert "db_execution_ms" in result["timing"]
+ assert result["timing"]["db_execution_ms"] >= 0
+ assert result["is_cached"] is None
+ assert result["timing"]["is_cached"] is False
+
+
+@patch(
+ "superset.common.query_context_processor.QueryCacheManager",
+)
+@patch("superset.common.chart_data_timing.logger")
+def test_slow_query_logging(
+ mock_logger, mock_cache_cls, processor_with_cache, mock_query_obj
+):
+ """WARNING log is emitted when total_ms exceeds the configured
threshold."""
+ cache = MagicMock()
+ cache.is_loaded = True
+ cache.is_cached = True
+ cache.df = pd.DataFrame({"col1": [1]})
+ cache.cache_dttm = None
+ cache.queried_dttm = None
+ cache.applied_template_filters = []
+ cache.applied_filter_columns = []
+ cache.rejected_filter_columns = []
+ cache.annotation_data = {}
+ cache.error_message = None
+ cache.query = ""
+ cache.status = "success"
+ cache.stacktrace = None
+ cache.sql_rowcount = 1
+ mock_cache_cls.get.return_value = cache
+
+ with patch.object(processor_with_cache, "query_cache_key",
return_value="key"):
+ with patch.object(processor_with_cache, "get_cache_timeout",
return_value=300):
+ result = processor_with_cache.get_df_payload(mock_query_obj)
+
+ # Set threshold to 0 so any query triggers slow logging.
+ finalize_payload(result, slow_query_threshold_ms=0)
+
+ mock_logger.warning.assert_called_once()
+ call_args = mock_logger.warning.call_args[0]
+ assert "Slow chart query" in call_args[0]
+ # On cache hit, db_execution should be "cached" not a number
+ assert "cached" in call_args
+
+
+def results_payload() -> dict[str, Any]:
+ return {
+ "df": pd.DataFrame({"col1": [1]}),
+ "status": QueryStatus.SUCCESS,
+ "rowcount": 1,
+ "sql_rowcount": 1,
+ "applied_filter_columns": [],
+ "rejected_filter_columns": [],
+ }
+
+
+def test_results_payload_omits_timing_when_disabled():
+ query_context = MagicMock()
+ query_context.result_type = ChartDataResultType.RESULTS
+ query_context.result_format = "json"
+ query_context.get_df_payload.return_value = {
+ **results_payload(),
+ TIMING_KEY: internal_timing(),
+ }
+ query_context.get_data.return_value = [{"col1": 1}]
+
+ query_obj = MagicMock()
+ query_obj.result_type = ChartDataResultType.RESULTS
+ query_obj.applied_time_extras = {}
+
+ with (
+ patch(
+ "superset.common.query_actions.extract_dataframe_dtypes",
return_value=[0]
+ ),
+ patch(
+ "superset.common.query_actions.get_time_filter_status",
+ return_value=([], []),
+ ),
+ patch("superset.common.query_actions._detect_currency",
return_value=None),
+ ):
+ result = _get_full(query_context, query_obj)
+
+ with patch("superset.common.chart_data_timing.time.perf_counter",
return_value=3.0):
+ finalize_payload(result, include_timing=False)
+
+ assert "timing" not in result
+ assert TIMING_KEY not in result
+
+
+def test_results_payload_finalizes_timing_when_enabled():
+ query_context = MagicMock()
+ query_context.result_type = ChartDataResultType.RESULTS
+ query_context.result_format = "json"
+ query_context.get_df_payload.return_value = {
+ **results_payload(),
+ TIMING_KEY: internal_timing(),
+ "is_cached": True,
+ }
+ query_context.get_data.return_value = [{"col1": 1}]
+
+ query_obj = MagicMock()
+ query_obj.result_type = ChartDataResultType.RESULTS
+ query_obj.applied_time_extras = {}
+
+ with (
+ patch(
+ "superset.common.query_actions.extract_dataframe_dtypes",
return_value=[0]
+ ),
+ patch(
+ "superset.common.query_actions.get_time_filter_status",
+ return_value=([], []),
+ ),
+ patch("superset.common.query_actions._detect_currency",
return_value=None),
+ ):
+ result = _get_full(query_context, query_obj)
+
+ with patch("superset.common.chart_data_timing.time.perf_counter",
return_value=3.0):
+ finalize_payload(result)
+
+ assert TIMING_KEY not in result
+ assert result["timing"]["validate_ms"] == 1.0
+ assert result["timing"]["cache_lookup_ms"] == 2.0
+ assert result["timing"]["db_execution_ms"] is None
+ assert result["timing"]["result_processing_ms"] == 1000.0
+ assert result["timing"]["total_ms"] == 2000.0
+ assert result["timing"]["is_cached"] is True
Review Comment:
**Suggestion:** This assertion is incorrect for the `RESULTS` flow:
`_get_full()` does not carry `is_cached` into the reduced results payload, so
`finalize_timing_payload()` computes `timing["is_cached"]` as `False`. As
written, this test will fail even when the implementation behaves as designed.
Update the expectation (or include `is_cached` in the payload contract before
finalization) so the test matches actual data flow. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ RESULTS timing unit test fails on every test run.
- ⚠️ Confusion about cache flag behavior in RESULTS payloads.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Run the unit test `test_results_payload_finalizes_timing_when_enabled` in
`tests/unit_tests/common/test_query_context_processor_timing.py:350-386`
(e.g. `pytest
tests/unit_tests/common/test_query_context_processor_timing.py -k
test_results_payload_finalizes_timing_when_enabled`).
2. Inside this test, `query_context.get_df_payload.return_value` is
constructed at
`tests/unit_tests/common/test_query_context_processor_timing.py:351-359` to
include
`TIMING_KEY: internal_timing()` and `"is_cached": True` along with the
result dataframe
metadata.
3. `_get_full()` in `superset/common/query_actions.py:154-205` is called at
`tests/unit_tests/common/test_query_context_processor_timing.py:375`, which
builds a
reduced `result` dict for `ChartDataResultType.RESULTS` that includes only
`"data"`,
`"colnames"`, `"coltypes"`, `"rowcount"`, `"sql_rowcount"`,
`"detected_currency"`, and
(optionally) `TIMING_KEY`; it does **not** copy `"is_cached"` into this
reduced result, so
the returned `result` has no `"is_cached"` key.
4. `finalize_payload(result)` wraps `finalize_timing_payload` at
`tests/unit_tests/common/test_query_context_processor_timing.py:377-378`,
which calls
`finalize_timing_payload` in `superset/common/chart_data_timing.py:61-98`;
there
`timing["is_cached"] = payload.get("is_cached") is True` at line 93 uses the
reduced
`result` (without `"is_cached"`), so `payload.get("is_cached")` is `None` and
`timing["is_cached"]` becomes `False`, causing the assertion `assert
result["timing"]["is_cached"] is True` at
`tests/unit_tests/common/test_query_context_processor_timing.py:386` to fail
even though
the production code correctly drops `"is_cached"` in the RESULTS payload
path.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8b7c2f42fb1840d6bcb9b75a40d0d73c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8b7c2f42fb1840d6bcb9b75a40d0d73c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/common/test_query_context_processor_timing.py
**Line:** 386:386
**Comment:**
*Api Mismatch: This assertion is incorrect for the `RESULTS` flow:
`_get_full()` does not carry `is_cached` into the reduced results payload, so
`finalize_timing_payload()` computes `timing["is_cached"]` as `False`. As
written, this test will fail even when the implementation behaves as designed.
Update the expectation (or include `is_cached` in the payload contract before
finalization) so the test matches actual data flow.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=07a8e4e11b5a499f3525def32fbd7ca1ac427afb8b754efa073cfc363cd94e54&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=07a8e4e11b5a499f3525def32fbd7ca1ac427afb8b754efa073cfc363cd94e54&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]