codeant-ai-for-open-source[bot] commented on code in PR #41204:
URL: https://github.com/apache/superset/pull/41204#discussion_r3437349831


##########
superset/common/query_object_factory.py:
##########
@@ -66,6 +66,13 @@ def create(  # pylint: disable=too-many-arguments
         processed_extras = self._process_extras(extras)
         result_type = kwargs.setdefault("result_type", parent_result_type)
 
+        # Rename deprecated kwargs before any processing so that downstream 
code
+        # (time-range resolution, QueryObject) only ever sees the canonical 
names.
+        for field in DEPRECATED_FIELDS:
+            if field.old_name in kwargs:
+                if value := kwargs.pop(field.old_name):
+                    kwargs.setdefault(field.new_name, value)

Review Comment:
   **Suggestion:** The deprecated-field normalization can drop valid 
`groupby`/deprecated values when the canonical key is present but null-like 
(for example `columns: None`). Because `setdefault` treats an existing `None` 
key as already set, the deprecated value is discarded after `pop`, so grouping 
can be silently lost and queries can return different results. Normalize with 
explicit precedence logic that still promotes deprecated values when the 
canonical field is missing or null-like, instead of unconditional `setdefault` 
on a possibly null existing key. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ /superset/v1/query/ can ignore valid groupby columns.
   - ⚠️ QueryContextFactory users risk losing grouping semantics.
   - ⚠️ Aggregation queries may return ungrouped, inflated result sets.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Use the legacy query endpoint `POST /superset/v1/query/` which is 
implemented in
   `superset/views/api.py:8-22`. This endpoint reads 
`request.form["query_context"]` (line
   16) and calls 
`QueryContextFactory.create(**json.loads(request.form["query_context"]))`
   (lines 16-18).
   
   2. In the request body, supply a `query_context` JSON where a query object 
contains both a
   null-like canonical field and a populated deprecated field, for example:
   
      ```json
   
      {
   
        "datasource": {"id": 1, "type": "table"},
   
        "queries": [{
   
          "columns": null,
   
          "groupby": ["country"],
   
          "metrics": ["sum__value"]
   
        }],
   
        "result_type": "full",
   
        "result_format": "json"
   
      }
   
      ```
   
      When `json.loads` runs in `Api.query`, this becomes a Python dict where 
the
      single-element dict in `queries` has `{"columns": None, "groupby": 
["country"],
      "metrics": [...]}`.
   
   3. The call stack proceeds into `QueryContextFactory.create` in
   `superset/common/query_context_factory.py:38-48`, which iterates the 
`queries` list and
   for each `query_obj` calls `self._query_object_factory.create(result_type,
   datasource=datasource, server_pagination=server_pagination, **query_obj)` 
(lines 38-47).
   For our example, `kwargs` in `QueryObjectFactory.create` contains both 
`columns=None` and
   `groupby=["country"]`.
   
   4. Inside `QueryObjectFactory.create` in 
`superset/common/query_object_factory.py:52-96`,
   the deprecated-field normalization loop runs at lines 69-74:
   
      ```python
   
      for field in DEPRECATED_FIELDS:
   
          if field.old_name in kwargs:
   
              if value := kwargs.pop(field.old_name):
   
                  kwargs.setdefault(field.new_name, value)
   
      ```
   
      For the `DeprecatedField(old_name="groupby", new_name="columns")`, 
`field.old_name` is
      `"groupby"` and `field.new_name` is `"columns"`. Since `"groupby"` is in 
`kwargs` with
      value `["country"]`, `value` is truthy and `"groupby"` is popped. 
However, because
      `"columns"` already exists in `kwargs` with value `None`, 
`kwargs.setdefault("columns",
      ["country"])` does nothing (dict `setdefault` does not overwrite an 
existing key even
      if its value is `None`). The effect is:
   
      - `groupby` is removed from `kwargs`.
   
      - `columns` remains `None`.
   
      When the function later constructs a `QueryObject` (lines 89-95), it 
passes
      `columns=None`, so `QueryObject.__init__` in 
`superset/common/query_object.py:112-167`
      sets `self.columns = columns or []` (line 142), resulting in 
`self.columns == []` and
      no grouping. Prior to this PR, the same input would reach
      `QueryObject._rename_deprecated_fields` in 
`superset/common/query_object.py:219-237`,
      which would see `groupby` in `kwargs` and set `self.columns` from 
`groupby`, preserving
      the grouping. Thus, a valid `groupby` configuration is silently dropped 
when `columns`
      is present but null-like, changing query semantics and returned results.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=ced994bcbe0a441e898f0903670452d8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=ced994bcbe0a441e898f0903670452d8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/common/query_object_factory.py
   **Line:** 71:74
   **Comment:**
        *Logic Error: The deprecated-field normalization can drop valid 
`groupby`/deprecated values when the canonical key is present but null-like 
(for example `columns: None`). Because `setdefault` treats an existing `None` 
key as already set, the deprecated value is discarded after `pop`, so grouping 
can be silently lost and queries can return different results. Normalize with 
explicit precedence logic that still promotes deprecated values when the 
canonical field is missing or null-like, instead of unconditional `setdefault` 
on a possibly null existing key.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41204&comment_hash=f683d55198dc629fcccaa334d5294adf31cad564f887655f083dc001d6086b19&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41204&comment_hash=f683d55198dc629fcccaa334d5294adf31cad564f887655f083dc001d6086b19&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to