Re: [PR] fix(result_set): preserve column metadata for empty result sets [superset]

via GitHub Wed, 03 Dec 2025 10:59:52 -0800


codeant-ai-for-open-source[bot] commented on code in PR #35962:
URL: https://github.com/apache/superset/pull/35962#discussion_r2586268297



##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def 
test_get_column_description_from_empty_data_using_cursor_description(
     )
     assert any(col.get("column_name") == "__time" for col in 
result_set.columns)
     logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+    """
+    Test that column metadata is preserved when query returns zero rows.
+
+    When a query returns no data but has a valid cursor description, the
+    column names and types from cursor_description should be preserved
+    in the result set. This allows downstream consumers (like the UI)
+    to display column headers even for empty result sets.
+    """
+    data: DbapiResult = []
+    description = [
+        ("id", "int", None, None, None, None, True),
+        ("name", "varchar", None, None, None, None, True),
+        ("created_at", "timestamp", None, None, None, None, True),
+    ]
+
+    result_set = SupersetResultSet(
+        data,
+        description,  # type: ignore
+        BaseEngineSpec,
+    )
+
+    # Verify column count
+    assert len(result_set.columns) == 3
+
+    # Verify column names are preserved
+    column_names = [col["column_name"] for col in result_set.columns]
+    assert column_names == ["id", "name", "created_at"]
+
+    # Verify types from cursor_description are used
+    assert result_set.columns[0]["type"] == "INT"
+    assert result_set.columns[1]["type"] == "VARCHAR"
+    assert result_set.columns[2]["type"] == "TIMESTAMP"
+
+    # Verify the PyArrow table has the correct schema
+    assert result_set.table.num_rows == 0
+    assert result_set.table.num_columns == 3
+    assert result_set.table.column_names == ["id", "name", "created_at"]

Review Comment:
   **Suggestion:** The test directly compares `result_set.table.num_columns` 
and `result_set.table.column_names` to Python primitives; depending on pyarrow 
versions `column_names` may not be exactly a list (or `num_columns` access 
semantics may vary). Use `len(result_set.table.column_names)` and cast 
`column_names` to `list(...)` to make assertions robust and avoid false 
negatives. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
       assert len(result_set.table.column_names) == 3
       assert list(result_set.table.column_names) == ["id", "name", 
"created_at"]
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Using len(...) and casting column_names to list(...) is more tolerant of 
pyarrow API differences
   (some versions expose sequence-like objects). The change improves test 
robustness without
   altering semantics.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/result_set_test.py
   **Line:** 226:227
   **Comment:**
        *Possible Bug: The test directly compares 
`result_set.table.num_columns` and `result_set.table.column_names` to Python 
primitives; depending on pyarrow versions `column_names` may not be exactly a 
list (or `num_columns` access semantics may vary). Use 
`len(result_set.table.column_names)` and cast `column_names` to `list(...)` to 
make assertions robust and avoid false negatives.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def 
test_get_column_description_from_empty_data_using_cursor_description(
     )
     assert any(col.get("column_name") == "__time" for col in 
result_set.columns)
     logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+    """
+    Test that column metadata is preserved when query returns zero rows.
+
+    When a query returns no data but has a valid cursor description, the
+    column names and types from cursor_description should be preserved
+    in the result set. This allows downstream consumers (like the UI)
+    to display column headers even for empty result sets.
+    """
+    data: DbapiResult = []
+    description = [
+        ("id", "int", None, None, None, None, True),
+        ("name", "varchar", None, None, None, None, True),
+        ("created_at", "timestamp", None, None, None, None, True),
+    ]
+
+    result_set = SupersetResultSet(
+        data,
+        description,  # type: ignore
+        BaseEngineSpec,
+    )
+
+    # Verify column count
+    assert len(result_set.columns) == 3
+
+    # Verify column names are preserved
+    column_names = [col["column_name"] for col in result_set.columns]
+    assert column_names == ["id", "name", "created_at"]
+
+    # Verify types from cursor_description are used
+    assert result_set.columns[0]["type"] == "INT"
+    assert result_set.columns[1]["type"] == "VARCHAR"
+    assert result_set.columns[2]["type"] == "TIMESTAMP"

Review Comment:
   **Suggestion:** The test asserts exact uppercase SQL type strings ("INT", 
"VARCHAR", "TIMESTAMP") but the actual mapping comes from the engine spec via 
`get_datatype`; hardcoding the expected strings makes the test brittle and can 
fail if `BaseEngineSpec.get_datatype` returns a different canonical 
representation. Use `BaseEngineSpec.get_datatype` on the description's type 
entries to derive the expected values. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
       # Verify types from cursor_description are used (derive expected values 
from the engine spec)
       assert result_set.columns[0]["type"] == 
BaseEngineSpec.get_datatype(description[0][1])
       assert result_set.columns[1]["type"] == 
BaseEngineSpec.get_datatype(description[1][1])
       assert result_set.columns[2]["type"] == 
BaseEngineSpec.get_datatype(description[2][1])
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The suggestion reduces brittleness: the test currently assumes 
BaseEngineSpec.get_datatype
   will always return those exact uppercased strings. Deriving expected values 
from the same
   engine spec implementation ties the assertion to the actual mapping and 
prevents false
   failures if the canonical representation changes. It's a pragmatic, low-risk 
improvement.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/result_set_test.py
   **Line:** 219:222
   **Comment:**
        *Possible Bug: The test asserts exact uppercase SQL type strings 
("INT", "VARCHAR", "TIMESTAMP") but the actual mapping comes from the engine 
spec via `get_datatype`; hardcoding the expected strings makes the test brittle 
and can fail if `BaseEngineSpec.get_datatype` returns a different canonical 
representation. Use `BaseEngineSpec.get_datatype` on the description's type 
entries to derive the expected values.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def 
test_get_column_description_from_empty_data_using_cursor_description(
     )
     assert any(col.get("column_name") == "__time" for col in 
result_set.columns)
     logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+    """
+    Test that column metadata is preserved when query returns zero rows.
+
+    When a query returns no data but has a valid cursor description, the
+    column names and types from cursor_description should be preserved
+    in the result set. This allows downstream consumers (like the UI)
+    to display column headers even for empty result sets.
+    """
+    data: DbapiResult = []
+    description = [
+        ("id", "int", None, None, None, None, True),
+        ("name", "varchar", None, None, None, None, True),
+        ("created_at", "timestamp", None, None, None, None, True),
+    ]
+
+    result_set = SupersetResultSet(
+        data,
+        description,  # type: ignore
+        BaseEngineSpec,
+    )
+
+    # Verify column count
+    assert len(result_set.columns) == 3
+
+    # Verify column names are preserved
+    column_names = [col["column_name"] for col in result_set.columns]
+    assert column_names == ["id", "name", "created_at"]
+
+    # Verify types from cursor_description are used
+    assert result_set.columns[0]["type"] == "INT"
+    assert result_set.columns[1]["type"] == "VARCHAR"
+    assert result_set.columns[2]["type"] == "TIMESTAMP"
+
+    # Verify the PyArrow table has the correct schema
+    assert result_set.table.num_rows == 0
+    assert result_set.table.num_columns == 3
+    assert result_set.table.column_names == ["id", "name", "created_at"]
+
+    # Verify DataFrame conversion works
+    df = result_set.to_pandas_df()
+    assert len(df) == 0
+    assert list(df.columns) == ["id", "name", "created_at"]

Review Comment:
   **Suggestion:** The DataFrame column names produced by pyarrow/pandas 
conversion can sometimes be bytes or non-string types; asserting exact list 
equality can fail. Normalize column names to strings before comparing to avoid 
flaky failures. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
       assert list(map(str, df.columns)) == ["id", "name", "created_at"]
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Normalizing column names to strings (e.g., list(map(str, df.columns))) 
prevents flakiness
   when column names come through as bytes or other non-str types. It's a small 
defensive tweak
   that reduces sporadic test failures.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/result_set_test.py
   **Line:** 232:232
   **Comment:**
        *Possible Bug: The DataFrame column names produced by pyarrow/pandas 
conversion can sometimes be bytes or non-string types; asserting exact list 
equality can fail. Normalize column names to strings before comparing to avoid 
flaky failures.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(result_set): preserve column metadata for empty result sets [superset]

Reply via email to