Copilot commented on code in PR #38986:
URL: https://github.com/apache/superset/pull/38986#discussion_r3023531811
##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,43 @@ def
test_get_column_description_from_empty_data_using_cursor_description(
)
assert any(col.get("column_name") == "__time" for col in
result_set.columns)
logger.exception.assert_not_called()
+
+
+def test_empty_column_names_get_synthetic_names() -> None:
Review Comment:
The new tests cover empty names and multiple empty names, but they don’t
cover the collision case where a synthetic name matches an explicit column name
(e.g. empty name at position 0 produces `_col_0` while a later column is
explicitly named `_col_0`). Adding a regression test for that scenario would
lock in the expected behavior (ideally: keep the explicit name unchanged and
choose a different synthetic name for the unnamed column).
##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,43 @@ def
test_get_column_description_from_empty_data_using_cursor_description(
)
assert any(col.get("column_name") == "__time" for col in
result_set.columns)
logger.exception.assert_not_called()
+
+
+def test_empty_column_names_get_synthetic_names() -> None:
+ """
+ SQL Server returns an empty-string column name in cursor.description for
+ any un-aliased expression (e.g. ``SELECT COUNT(*) FROM t``). An empty
+ field name is illegal in NumPy structured arrays and PyArrow tables.
+
+ SupersetResultSet must replace empty column names with synthetic names
+ so queries like ``SELECT COUNT(*) FROM t`` succeed on MSSQL.
+
+ Regression test for https://github.com/apache/superset/issues/23848
+ """
+ data = [(42,)]
+ description = [("", 3, None, None, None, None, None)]
+ result_set = SupersetResultSet(data, description, BaseEngineSpec) # type:
ignore
+
+ assert result_set.columns[0]["column_name"] == "_col_0"
+ df = result_set.to_pandas_df()
+ assert list(df.columns) == ["_col_0"]
+ assert df["_col_0"].iloc[0] == 42
+
+
+def test_multiple_empty_column_names_get_unique_synthetic_names() -> None:
Review Comment:
The new tests cover empty names and multiple empty names, but they don’t
cover the collision case where a synthetic name matches an explicit column name
(e.g. empty name at position 0 produces `_col_0` while a later column is
explicitly named `_col_0`). Adding a regression test for that scenario would
lock in the expected behavior (ideally: keep the explicit name unchanged and
choose a different synthetic name for the unnamed column).
##########
superset/result_set.py:
##########
@@ -116,8 +116,16 @@ def __init__( # pylint: disable=too-many-locals # noqa:
C901
if cursor_description:
# get deduped list of column names
+ # Some databases (e.g. SQL Server) return an empty string as the
+ # column name for un-aliased expressions like SELECT COUNT(*).
+ # An empty field name is illegal in NumPy structured arrays and in
+ # PyArrow tables, so we substitute a synthetic name when needed.
+ # See https://github.com/apache/superset/issues/23848
column_names = dedup(
- [convert_to_string(col[0]) for col in cursor_description]
+ [
+ convert_to_string(col[0]) or f"_col_{i}"
+ for i, col in enumerate(cursor_description)
+ ]
Review Comment:
Generating `_col_<i>` names before `dedup(...)` can unintentionally rename a
*real* user column if it collides with the synthetic name and appears later in
the select-list (e.g. `SELECT COUNT(*), _col_0 FROM t` could end up renaming
the real `_col_0` to `_col_0__1`). Consider generating synthetic names that are
guaranteed not to collide with any non-empty column names (e.g., pre-collect
non-empty names into a set, then for each empty name pick the first available
`_col_<n>` not in that set; update the set as you assign names). This preserves
existing explicit column names while still ensuring uniqueness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]