codeant-ai-for-open-source[bot] commented on code in PR #38689:
URL: https://github.com/apache/superset/pull/38689#discussion_r2945027252


##########
superset/models/helpers.py:
##########
@@ -2712,9 +2712,17 @@ def get_sqla_query(  # pylint: 
disable=too-many-arguments,too-many-locals,too-ma
         if granularity not in self.dttm_cols and granularity is not None:
             granularity = self.main_dttm_col
 
-        columns_by_name: dict[str, "TableColumn"] = {
-            col.column_name: col for col in self.columns
-        }
+        columns_by_name: dict[str, "TableColumn"] = {}
+
+        for col in self.columns:
+            # Always map real column name (source of truth)
+            columns_by_name[col.column_name] = col
+
+        # Add verbose_name ONLY if it doesn't conflict
+        for col in self.columns:
+            if col.verbose_name and col.verbose_name not in columns_by_name:
+                columns_by_name[col.verbose_name] = col

Review Comment:
   **Suggestion:** `verbose_name` aliases are added without checking metric 
names, so a verbose alias that matches a metric name will be resolved as a 
column first. This bypasses metric-filter handling and applies the filter in 
`WHERE` instead of `HAVING`, producing incorrect query results. Exclude aliases 
that collide with metric names when building `columns_by_name`. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ HAVING metric filters downgraded into WHERE predicates.
   - ⚠️ `POST /api/v1/chart/data` may return wrong aggregates.
   ```
   </details>
   
   ```suggestion
           # Add verbose_name ONLY if it doesn't conflict
           metric_names = {metric.metric_name for metric in self.metrics}
           for col in self.columns:
               if (
                   col.verbose_name
                   and col.verbose_name not in columns_by_name
                   and col.verbose_name not in metric_names
               ):
                   columns_by_name[col.verbose_name] = col
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call chart data execution via `POST /api/v1/chart/data` in
   `superset/charts/data/api.py:199-260`, which builds `ChartDataCommand` and 
executes query
   payloads.
   
   2. Request execution flows through `ChartDataCommand.run()`
   (`superset/commands/chart/data/get_data_command.py:40-48`) to
   `QueryContextProcessor.get_payload()`
   (`superset/common/query_context_processor.py:325-361`) and then `_get_full()`
   (`superset/common/query_actions.py:153-161`) which calls
   `query_context.get_df_payload(...)`.
   
   3. `get_df_payload()` calls datasource query generation through 
`get_query_result()`
   (`superset/common/query_context_processor.py:131,235-243`) and eventually
   `ExploreMixin.get_sqla_query()` (`superset/models/helpers.py:2630+`).
   
   4. In `get_sqla_query()`, `columns_by_name` includes `verbose_name` aliases
   (`superset/models/helpers.py:2721-2724`). If a column verbose alias equals a 
metric name
   (cross-type collision is allowed; only per-table uniqueness exists in
   `superset/connectors/sqla/models.py:1` for columns and `:246` for metrics), 
then filter
   resolution checks column first (`helpers.py:3015`) and metric second
   (`helpers.py:3018-3027`), so `is_metric_filter` stays false.
   
   5. The filter is then appended to `WHERE` (`helpers.py:3046-3048`) instead 
of `HAVING`,
   producing semantically wrong SQL for metric filters (e.g., metric-name 
filter intended for
   aggregate expression applied to base column predicate).
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 2721:2724
   **Comment:**
        *Logic Error: `verbose_name` aliases are added without checking metric 
names, so a verbose alias that matches a metric name will be resolved as a 
column first. This bypasses metric-filter handling and applies the filter in 
`WHERE` instead of `HAVING`, producing incorrect query results. Exclude aliases 
that collide with metric names when building `columns_by_name`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38689&comment_hash=3e9a70be4481aec8ac2b9b4cdb993f6f524c963fd839b6364f1e8f986fb337cc&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38689&comment_hash=3e9a70be4481aec8ac2b9b4cdb993f6f524c963fd839b6364f1e8f986fb337cc&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to