codeant-ai-for-open-source[bot] commented on code in PR #41204:
URL: https://github.com/apache/superset/pull/41204#discussion_r3437349831
##########
superset/common/query_object_factory.py:
##########
@@ -66,6 +66,13 @@ def create( # pylint: disable=too-many-arguments
processed_extras = self._process_extras(extras)
result_type = kwargs.setdefault("result_type", parent_result_type)
+ # Rename deprecated kwargs before any processing so that downstream
code
+ # (time-range resolution, QueryObject) only ever sees the canonical
names.
+ for field in DEPRECATED_FIELDS:
+ if field.old_name in kwargs:
+ if value := kwargs.pop(field.old_name):
+ kwargs.setdefault(field.new_name, value)
Review Comment:
**Suggestion:** The deprecated-field normalization can drop valid
`groupby`/deprecated values when the canonical key is present but null-like
(for example `columns: None`). Because `setdefault` treats an existing `None`
key as already set, the deprecated value is discarded after `pop`, so grouping
can be silently lost and queries can return different results. Normalize with
explicit precedence logic that still promotes deprecated values when the
canonical field is missing or null-like, instead of unconditional `setdefault`
on a possibly null existing key. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ /superset/v1/query/ can ignore valid groupby columns.
- ⚠️ QueryContextFactory users risk losing grouping semantics.
- ⚠️ Aggregation queries may return ungrouped, inflated result sets.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Use the legacy query endpoint `POST /superset/v1/query/` which is
implemented in
`superset/views/api.py:8-22`. This endpoint reads
`request.form["query_context"]` (line
16) and calls
`QueryContextFactory.create(**json.loads(request.form["query_context"]))`
(lines 16-18).
2. In the request body, supply a `query_context` JSON where a query object
contains both a
null-like canonical field and a populated deprecated field, for example:
```json
{
"datasource": {"id": 1, "type": "table"},
"queries": [{
"columns": null,
"groupby": ["country"],
"metrics": ["sum__value"]
}],
"result_type": "full",
"result_format": "json"
}
```
When `json.loads` runs in `Api.query`, this becomes a Python dict where
the
single-element dict in `queries` has `{"columns": None, "groupby":
["country"],
"metrics": [...]}`.
3. The call stack proceeds into `QueryContextFactory.create` in
`superset/common/query_context_factory.py:38-48`, which iterates the
`queries` list and
for each `query_obj` calls `self._query_object_factory.create(result_type,
datasource=datasource, server_pagination=server_pagination, **query_obj)`
(lines 38-47).
For our example, `kwargs` in `QueryObjectFactory.create` contains both
`columns=None` and
`groupby=["country"]`.
4. Inside `QueryObjectFactory.create` in
`superset/common/query_object_factory.py:52-96`,
the deprecated-field normalization loop runs at lines 69-74:
```python
for field in DEPRECATED_FIELDS:
if field.old_name in kwargs:
if value := kwargs.pop(field.old_name):
kwargs.setdefault(field.new_name, value)
```
For the `DeprecatedField(old_name="groupby", new_name="columns")`,
`field.old_name` is
`"groupby"` and `field.new_name` is `"columns"`. Since `"groupby"` is in
`kwargs` with
value `["country"]`, `value` is truthy and `"groupby"` is popped.
However, because
`"columns"` already exists in `kwargs` with value `None`,
`kwargs.setdefault("columns",
["country"])` does nothing (dict `setdefault` does not overwrite an
existing key even
if its value is `None`). The effect is:
- `groupby` is removed from `kwargs`.
- `columns` remains `None`.
When the function later constructs a `QueryObject` (lines 89-95), it
passes
`columns=None`, so `QueryObject.__init__` in
`superset/common/query_object.py:112-167`
sets `self.columns = columns or []` (line 142), resulting in
`self.columns == []` and
no grouping. Prior to this PR, the same input would reach
`QueryObject._rename_deprecated_fields` in
`superset/common/query_object.py:219-237`,
which would see `groupby` in `kwargs` and set `self.columns` from
`groupby`, preserving
the grouping. Thus, a valid `groupby` configuration is silently dropped
when `columns`
is present but null-like, changing query semantics and returned results.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=ced994bcbe0a441e898f0903670452d8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=ced994bcbe0a441e898f0903670452d8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_object_factory.py
**Line:** 71:74
**Comment:**
*Logic Error: The deprecated-field normalization can drop valid
`groupby`/deprecated values when the canonical key is present but null-like
(for example `columns: None`). Because `setdefault` treats an existing `None`
key as already set, the deprecated value is discarded after `pop`, so grouping
can be silently lost and queries can return different results. Normalize with
explicit precedence logic that still promotes deprecated values when the
canonical field is missing or null-like, instead of unconditional `setdefault`
on a possibly null existing key.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41204&comment_hash=f683d55198dc629fcccaa334d5294adf31cad564f887655f083dc001d6086b19&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41204&comment_hash=f683d55198dc629fcccaa334d5294adf31cad564f887655f083dc001d6086b19&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]