codeant-ai-for-open-source[bot] commented on code in PR #37516: URL: https://github.com/apache/superset/pull/37516#discussion_r2735425468
########## tests/unit_tests/common/test_query_context_processor_timing.py: ########## @@ -0,0 +1,229 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging +from unittest.mock import MagicMock, patch + +import pandas as pd +import pytest + +from superset.common.query_context_processor import QueryContextProcessor + + [email protected] +def mock_query_obj(): + query_obj = MagicMock() + query_obj.columns = ["col1"] + query_obj.column_names = ["col1"] + query_obj.metrics = [] + query_obj.metric_names = [] + query_obj.from_dttm = None + query_obj.to_dttm = None + query_obj.annotation_layers = [] + return query_obj + + [email protected] +def processor_with_cache(): + """Create a processor with a mocked cache that returns a loaded result.""" + mock_qc = MagicMock() + mock_qc.force = False + + processor = QueryContextProcessor.__new__(QueryContextProcessor) + processor._query_context = mock_qc + processor._qc_datasource = MagicMock() + processor._qc_datasource.uid = "test_uid" + processor._qc_datasource.column_names = ["col1"] + return processor + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_present_in_payload(mock_cache_cls, processor_with_cache, mock_query_obj): + """Timing dict is included in get_df_payload() result.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1, 2]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 2 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) Review Comment: **Suggestion:** The test calls `get_df_payload()` which reads `current_app.config["STATS_LOGGER"]` but this test does not patch `current_app`, causing a KeyError or a Flask "working outside of application context" RuntimeError in CI; patch `current_app` and provide a dummy `STATS_LOGGER` before calling `get_df_payload()`. [possible bug] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Unit test suite fails under CI for this test file. - ⚠️ Chart data timing tests unstable without app context. ``` </details> ```suggestion with patch("superset.common.query_context_processor.current_app") as mock_app: mock_app.config = { "STATS_LOGGER": MagicMock(), "CHART_DATA_SLOW_QUERY_THRESHOLD_MS": None, } ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Run pytest on the test file tests/unit_tests/common/test_query_context_processor_timing.py. 2. pytest reaches test_timing_present_in_payload and executes the block at lines 75-77 that calls processor_with_cache.get_df_payload(mock_query_obj). 3. get_df_payload (superset/common/query_context_processor.py) executes and reaches the STATS_LOGGER access at line 195: stats_logger = current_app.config["STATS_LOGGER"]. 4. Because the test did not set up a Flask application context or patch current_app, accessing current_app.config raises a RuntimeError ("Working outside of application context") or KeyError, causing the unit test to fail in CI. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** tests/unit_tests/common/test_query_context_processor_timing.py **Line:** 77:77 **Comment:** *Possible Bug: The test calls `get_df_payload()` which reads `current_app.config["STATS_LOGGER"]` but this test does not patch `current_app`, causing a KeyError or a Flask "working outside of application context" RuntimeError in CI; patch `current_app` and provide a dummy `STATS_LOGGER` before calling `get_df_payload()`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## tests/unit_tests/common/test_query_context_processor_timing.py: ########## @@ -0,0 +1,229 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging +from unittest.mock import MagicMock, patch + +import pandas as pd +import pytest + +from superset.common.query_context_processor import QueryContextProcessor + + [email protected] +def mock_query_obj(): + query_obj = MagicMock() + query_obj.columns = ["col1"] + query_obj.column_names = ["col1"] + query_obj.metrics = [] + query_obj.metric_names = [] + query_obj.from_dttm = None + query_obj.to_dttm = None + query_obj.annotation_layers = [] + return query_obj + + [email protected] +def processor_with_cache(): + """Create a processor with a mocked cache that returns a loaded result.""" + mock_qc = MagicMock() + mock_qc.force = False + + processor = QueryContextProcessor.__new__(QueryContextProcessor) + processor._query_context = mock_qc + processor._qc_datasource = MagicMock() + processor._qc_datasource.uid = "test_uid" + processor._qc_datasource.column_names = ["col1"] + return processor + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_present_in_payload(mock_cache_cls, processor_with_cache, mock_query_obj): + """Timing dict is included in get_df_payload() result.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1, 2]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 2 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) + + assert "timing" in result + timing = result["timing"] + assert "validate_ms" in timing + assert "cache_lookup_ms" in timing + assert "result_processing_ms" in timing + assert "total_ms" in timing + assert "is_cached" in timing + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_values_are_non_negative( + mock_cache_cls, processor_with_cache, mock_query_obj +): + """All timing values are non-negative numbers.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 1 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) + + timing = result["timing"] + for key, value in timing.items(): + if isinstance(value, (int, float)): + assert value >= 0, f"timing[{key!r}] should be >= 0, got {value}" + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_no_db_execution_on_cache_hit( + mock_cache_cls, processor_with_cache, mock_query_obj +): + """db_execution_ms is absent when the result is served from cache.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 1 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) Review Comment: **Suggestion:** On cache-hit/no-db-execution tests the code still emits metrics via `current_app.config["STATS_LOGGER"]`; patch `current_app` with a dummy `STATS_LOGGER` in these tests to avoid KeyError or app-context errors. [possible bug] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Test failure in cache-hit timing unit test. - ⚠️ Test harness must mock Flask app context. ``` </details> ```suggestion with patch("superset.common.query_context_processor.current_app") as mock_app: mock_app.config = { "STATS_LOGGER": MagicMock(), "CHART_DATA_SLOW_QUERY_THRESHOLD_MS": None, } ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Run pytest and let it execute test_timing_no_db_execution_on_cache_hit in tests/unit_tests/common/test_query_context_processor_timing.py. 2. The test invokes processor_with_cache.get_df_payload(mock_query_obj) at lines 146-148. 3. get_df_payload tries to emit metrics by doing stats_logger = current_app.config["STATS_LOGGER"] at superset/common/query_context_processor.py:195. 4. The test does not set current_app, so metric emission touches Flask's current_app outside an application context (RuntimeError) or lacks STATS_LOGGER (KeyError); patching current_app.config with a dummy STATS_LOGGER avoids the failure. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** tests/unit_tests/common/test_query_context_processor_timing.py **Line:** 148:148 **Comment:** *Possible Bug: On cache-hit/no-db-execution tests the code still emits metrics via `current_app.config["STATS_LOGGER"]`; patch `current_app` with a dummy `STATS_LOGGER` in these tests to avoid KeyError or app-context errors. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/common/query_context_processor.py: ########## @@ -177,6 +193,31 @@ def get_df_payload( } ) cache.df.columns = [unescape_separator(col) for col in cache.df.columns.values] + timing["result_processing_ms"] = round((time.perf_counter() - t) * 1000, 2) + + timing["total_ms"] = round((time.perf_counter() - t0) * 1000, 2) + timing["is_cached"] = cache.is_cached + + # Emit per-phase metrics via STATS_LOGGER + stats_logger = current_app.config["STATS_LOGGER"] + for phase, value in timing.items(): + if isinstance(value, (int, float)): Review Comment: **Suggestion:** Boolean `is_cached` (a bool) will be treated as an `int` by `isinstance(value, (int, float))` and will be sent to `STATS_LOGGER` as a timing (divided by 1000), producing a spurious metric (e.g. 1 -> 0.001s); also accessing `current_app.config["STATS_LOGGER"]` can raise a KeyError if the config key is absent. Restrict the emitted values to numeric non-bool types and fetch the logger via `.get()` to avoid KeyError. [logic error] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Chart data may raise KeyError when STATS_LOGGER missing. - ❌ Spurious timing metric chart_data.is_cached reported. - ⚠️ Telemetry/metrics ingestion polluted by boolean timings. ``` </details> ```suggestion stats_logger = current_app.config.get("STATS_LOGGER") if stats_logger and hasattr(stats_logger, "timing"): for phase, value in timing.items(): # Exclude booleans (bool is a subclass of int) and only emit real numeric timings if isinstance(value, (int, float)) and not isinstance(value, bool): ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Call QueryContextProcessor.get_df_payload() at superset/common/query_context_processor.py:83 (function get_df_payload). This enters the method and builds the timing dict (see timing assignments). 2. Ensure the code reaches the metrics emission block at superset/common/query_context_processor.py:201-205 (the "Emit per-phase metrics" block). This always executes before returning the payload (lines 201-205). 3. Two concrete ways to observe the issues: - Missing STATS_LOGGER KeyError: a. Start a Flask app context without a STATS_LOGGER config key (i.e. current_app.config does not contain "STATS_LOGGER"). b. Call get_df_payload() (superset/common/query_context_processor.py:83). c. The statement current_app.config["STATS_LOGGER"] at superset/common/query_context_processor.py:202 will raise KeyError. - Spurious boolean timing metric: a. Populate the query cache so the call is a cache hit (QueryCacheManager.get returns a cache with is_cached True). See QueryCacheManager.get at superset/common/utils/query_cache_manager.py:153-211 which sets query_cache.is_cached when a cache_value is present. b. Call get_df_payload(); timing["is_cached"] is set from cache.is_cached at superset/common/query_context_processor.py (timing["is_cached"] = cache.is_cached). c. The emission loop at superset/common/query_context_processor.py:203-205 checks isinstance(value, (int, float)); bool is a subclass of int, so timing["is_cached"] (True/False) is treated as numeric and sent as a timing: stats_logger.timing("chart_data.is_cached", 1 / 1000). 4. Observation: - With missing STATS_LOGGER you get a KeyError at line 202. - With STATS_LOGGER present and a cache hit you will see the spurious "chart_data.is_cached" timing value of 0.001s for True (or 0.0 for False). Note: the pattern is intentional-looking, but the bool-as-int behavior is a language footgun and the dict access can raise; the suggested change avoids both outcomes. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/common/query_context_processor.py **Line:** 202:204 **Comment:** *Logic Error: Boolean `is_cached` (a bool) will be treated as an `int` by `isinstance(value, (int, float))` and will be sent to `STATS_LOGGER` as a timing (divided by 1000), producing a spurious metric (e.g. 1 -> 0.001s); also accessing `current_app.config["STATS_LOGGER"]` can raise a KeyError if the config key is absent. Restrict the emitted values to numeric non-bool types and fetch the logger via `.get()` to avoid KeyError. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## tests/unit_tests/common/test_query_context_processor_timing.py: ########## @@ -0,0 +1,229 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging +from unittest.mock import MagicMock, patch + +import pandas as pd +import pytest + +from superset.common.query_context_processor import QueryContextProcessor + + [email protected] +def mock_query_obj(): + query_obj = MagicMock() + query_obj.columns = ["col1"] + query_obj.column_names = ["col1"] + query_obj.metrics = [] + query_obj.metric_names = [] + query_obj.from_dttm = None + query_obj.to_dttm = None + query_obj.annotation_layers = [] + return query_obj + + [email protected] +def processor_with_cache(): + """Create a processor with a mocked cache that returns a loaded result.""" + mock_qc = MagicMock() + mock_qc.force = False + + processor = QueryContextProcessor.__new__(QueryContextProcessor) + processor._query_context = mock_qc + processor._qc_datasource = MagicMock() + processor._qc_datasource.uid = "test_uid" + processor._qc_datasource.column_names = ["col1"] + return processor + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_present_in_payload(mock_cache_cls, processor_with_cache, mock_query_obj): + """Timing dict is included in get_df_payload() result.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1, 2]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 2 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) + + assert "timing" in result + timing = result["timing"] + assert "validate_ms" in timing + assert "cache_lookup_ms" in timing + assert "result_processing_ms" in timing + assert "total_ms" in timing + assert "is_cached" in timing + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_values_are_non_negative( + mock_cache_cls, processor_with_cache, mock_query_obj +): + """All timing values are non-negative numbers.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 1 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) + + timing = result["timing"] + for key, value in timing.items(): + if isinstance(value, (int, float)): + assert value >= 0, f"timing[{key!r}] should be >= 0, got {value}" + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_no_db_execution_on_cache_hit( + mock_cache_cls, processor_with_cache, mock_query_obj +): + """db_execution_ms is absent when the result is served from cache.""" + cache = MagicMock() + cache.is_loaded = True + cache.is_cached = True + cache.df = pd.DataFrame({"col1": [1]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 1 + mock_cache_cls.get.return_value = cache + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) + + assert "db_execution_ms" not in result["timing"] + assert result["timing"]["is_cached"] is True + + +@patch( + "superset.common.query_context_processor.QueryCacheManager", +) +def test_timing_has_db_execution_on_cache_miss( + mock_cache_cls, processor_with_cache, mock_query_obj +): + """db_execution_ms is present when the query is executed against the database.""" + cache = MagicMock() + cache.is_loaded = False + cache.is_cached = False + cache.df = pd.DataFrame({"col1": [1]}) + cache.cache_dttm = None + cache.queried_dttm = None + cache.applied_template_filters = [] + cache.applied_filter_columns = [] + cache.rejected_filter_columns = [] + cache.annotation_data = {} + cache.error_message = None + cache.query = "" + cache.status = "success" + cache.stacktrace = None + cache.sql_rowcount = 1 + mock_cache_cls.get.return_value = cache + + processor_with_cache._qc_datasource.column_names = ["col1"] + processor_with_cache.get_query_result = MagicMock() + processor_with_cache.get_annotation_data = MagicMock(return_value={}) + + with patch.object(processor_with_cache, "query_cache_key", return_value="key"): + with patch.object(processor_with_cache, "get_cache_timeout", return_value=300): + result = processor_with_cache.get_df_payload(mock_query_obj) Review Comment: **Suggestion:** The cache-miss test invokes `get_df_payload()` which will call `stats_logger.timing(...)`; patch `current_app` and set `STATS_LOGGER` to a `MagicMock()` before invoking `get_df_payload()` so the call to `.timing()` is safe. [possible bug] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Cache-miss timing test fails in CI. - ⚠️ Timing instrumentation code requires mocked STATS_LOGGER. ``` </details> ```suggestion with patch("superset.common.query_context_processor.current_app") as mock_app: mock_app.config = { "STATS_LOGGER": MagicMock(), "CHART_DATA_SLOW_QUERY_THRESHOLD_MS": None, } ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Run pytest to execute test_timing_has_db_execution_on_cache_miss in tests/unit_tests/common/test_query_context_processor_timing.py. 2. The test calls processor_with_cache.get_df_payload(mock_query_obj) at lines 182-184 after setting up a cache miss. 3. get_df_payload emits per-phase metrics via current_app.config["STATS_LOGGER"] at superset/common/query_context_processor.py:195 and then calls stats_logger.timing(...). 4. Without patching current_app.config in this test, either current_app is unavailable (RuntimeError) or STATS_LOGGER is missing (KeyError), causing the test to error when stats_logger.timing is invoked. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** tests/unit_tests/common/test_query_context_processor_timing.py **Line:** 184:184 **Comment:** *Possible Bug: The cache-miss test invokes `get_df_payload()` which will call `stats_logger.timing(...)`; patch `current_app` and set `STATS_LOGGER` to a `MagicMock()` before invoking `get_df_payload()` so the call to `.timing()` is safe. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
