codeant-ai-for-open-source[bot] commented on code in PR #40677:
URL: https://github.com/apache/superset/pull/40677#discussion_r3339159928
##########
superset/result_set.py:
##########
@@ -184,6 +186,23 @@ def __init__( # pylint: disable=too-many-locals # noqa:
C901
TypeError, # this is super hackey,
# https://issues.apache.org/jira/browse/ARROW-7855
):
+ col_values = array[column].tolist()
+ if
is_feature_enabled("PRESERVE_NUMERIC_COLUMNS_FOR_SPECIAL_FLOATS"):
+ col_values = [
+ None if isinstance(v, str) and v in _FLOAT_SPECIAL
else v
+ for v in col_values
+ ]
+ try:
+ pa_data.append(pa.array(col_values))
+ continue
Review Comment:
**Suggestion:** The special-float cleanup is applied globally to every
engine whenever the feature flag is enabled, but this workaround is only valid
for Druid/pydruid behavior. As written, enabling the flag for Druid can
silently rewrite legitimate `"NaN"`/`"Infinity"` string values from other
engines into nulls and change inferred Arrow types. Gate this branch by engine
(for example, only when `db_engine_spec.engine == "druid"`) so non-Druid result
sets are not mutated. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Non-Druid queries mis-serialize "NaN"/Infinity strings as nulls.
- ⚠️ SQL Lab and dashboards see unexpected numeric column types.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Enable the global feature flag so that
`superset.result_set.is_feature_enabled("PRESERVE_NUMERIC_COLUMNS_FOR_SPECIAL_FLOATS")`
returns True; this is the condition checked in the Arrow error path at
`superset/result_set.py:189-197`.
2. Execute a SQL statement against a non-Druid database (for example, a
Postgres
`Database` whose engine spec is `PostgresEngineSpec` from
`superset/db_engine_specs/postgres.py`) through SQL Lab; this flows through
`SQLExecutor.execute()` to `execute_sql_with_cursor()` in
`superset/sql/execution/executor.py:17-25`.
3. In `execute_sql_with_cursor()`, when the cursor has a description, the
code at
`superset/sql/execution/executor.py:79-87` constructs a
`SupersetResultSet(rows,
description, database.db_engine_spec)`, passing the non-Druid
`db_engine_spec` into
`SupersetResultSet.__init__` in `superset/result_set.py:135-143`.
4. If one of the result columns has mixed Python values like `["NaN", 1.5,
"Infinity",
2.3, "-Infinity", None]` (the same pattern used in
`tests/unit_tests/result_set_test.py:45-58`),
`pa.array(array[column].tolist())` at
`superset/result_set.py:178-181` raises `ArrowInvalid` and falls into the
exception block
where `col_values` is set and, because the flag is enabled, the branch at
`superset/result_set.py:189-197` replaces any string `"NaN"`, `"Infinity"`,
or
`"-Infinity"` with `None` for this non-Druid engine before re-running
`pa.array(col_values)`, silently converting legitimate string values to
nulls and changing
the inferred Arrow type for that column.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=0e2b232739c34c809da0cc1b673c6ce6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=0e2b232739c34c809da0cc1b673c6ce6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 190:197
**Comment:**
*Logic Error: The special-float cleanup is applied globally to every
engine whenever the feature flag is enabled, but this workaround is only valid
for Druid/pydruid behavior. As written, enabling the flag for Druid can
silently rewrite legitimate `"NaN"`/`"Infinity"` string values from other
engines into nulls and change inferred Arrow types. Gate this branch by engine
(for example, only when `db_engine_spec.engine == "druid"`) so non-Druid result
sets are not mutated.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40677&comment_hash=c69f351253e32e792e0a870b3116fcb7201dc89e18fed4cd0eeef50a9589065a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40677&comment_hash=c69f351253e32e792e0a870b3116fcb7201dc89e18fed4cd0eeef50a9589065a&reaction=dislike'>👎</a>
##########
superset/result_set.py:
##########
@@ -277,13 +296,26 @@ def type_generic(
def data_type(self, col_name: str, pa_dtype: pa.DataType) -> Optional[str]:
"""Given a pyarrow data type, Returns a generic database type"""
- if set_type := self._type_dict.get(col_name):
+ set_type = self._type_dict.get(col_name)
+ pa_mapped = self.convert_pa_dtype(pa_dtype)
+
+ # pydruid infers column types from the first row value, so a None or
+ # special-float-string first value causes the column to be labelled
+ # STRING even when the actual data is numeric. When the feature flag
+ # is enabled, prefer PyArrow's inferred type over a cursor-description
+ # STRING so that numeric columns are not misreported.
+ if (
+ is_feature_enabled("PRESERVE_NUMERIC_COLUMNS_FOR_SPECIAL_FLOATS")
+ and set_type == "STRING"
+ and pa_mapped is not None
+ and pa_mapped != "STRING"
+ ):
+ return pa_mapped
+
Review Comment:
**Suggestion:** The metadata override in `data_type()` also runs for all
engines, so any backend that reports `STRING` in cursor metadata but returns
typed Python values can be reclassified to numeric/datetime unexpectedly when
the flag is enabled. This breaks type-contract consistency outside Druid.
Restrict this override to Druid result sets (or another explicit pydruid-only
condition) before preferring PyArrow type inference. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Non-Druid engines have metadata STRING columns retyped as numeric.
- ⚠️ SQL Lab column metadata inconsistent with database cursor description.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. With the feature flag enabled so that
`is_feature_enabled("PRESERVE_NUMERIC_COLUMNS_FOR_SPECIAL_FLOATS")` is True,
construct a
`SupersetResultSet` as done in
`tests/unit_tests/result_set_test.py:test_druid_none_first_value_reports_numeric_type_with_flag_enabled`
(lines 87-104): pass `data = [(None,), (1.5,), (2.3,), (None,), (4.7,)]`, a
cursor
`description = [("metric", "STRING", None, None, None, None, None)]`, and a
non-Druid
engine spec such as `PostgresEngineSpec` from
`superset/db_engine_specs/postgres.py` into
`SupersetResultSet(data, description, PostgresEngineSpec)`.
2. Inside `SupersetResultSet.__init__` (`superset/result_set.py:135-251`),
`_type_dict` is
populated at `superset/result_set.py:243-250` using
`db_engine_spec.get_datatype(deduped_cursor_desc[i][1])`, so for this column
`set_type` in
`data_type()` becomes `"STRING"` based solely on the cursor metadata from
the non-Druid
engine.
3. When SQL Lab or the async executor needs column metadata,
`SupersetResultSet.columns`
at `superset/result_set.py:73-88` iterates over the Arrow schema and calls
`self.data_type(col.name, col.type)`; `data_type()` (lines 38-59) computes
`pa_mapped =
self.convert_pa_dtype(pa_dtype)`, which returns `"FLOAT"` for the `float64`
Arrow type
inferred from the actual data.
4. Because the override condition at `superset/result_set.py:48-54`
currently checks only
the feature flag and `set_type == "STRING"` (without verifying that
`self.db_engine_spec.engine` is `"druid"`), the branch returns `pa_mapped`
(`"FLOAT"`) for
this non-Druid engine, so `_serialize_result_set()` in
`superset/sql/execution/celery_task.py:111-122` returns `columns =
result_set.columns`
with a `"FLOAT"` type even though the engine's cursor description declared
`"STRING"`,
causing type metadata for non-Druid backends to be reclassified unexpectedly
whenever this
flag is on.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d654c945036941bd97cb24caefd2f5a7&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d654c945036941bd97cb24caefd2f5a7&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 308:314
**Comment:**
*Api Mismatch: The metadata override in `data_type()` also runs for all
engines, so any backend that reports `STRING` in cursor metadata but returns
typed Python values can be reclassified to numeric/datetime unexpectedly when
the flag is enabled. This breaks type-contract consistency outside Druid.
Restrict this override to Druid result sets (or another explicit pydruid-only
condition) before preferring PyArrow type inference.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40677&comment_hash=96303c073287ecf142a80b2093d0e10558a4d47dc845f5f4af39b3ead73b956e&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40677&comment_hash=96303c073287ecf142a80b2093d0e10558a4d47dc845f5f4af39b3ead73b956e&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]