Re: [PR] fix(dataset): preserve numeric column types when pydruid infers STRING from first-row value [superset]

via GitHub Wed, 03 Jun 2026 16:02:51 -0700


madhushreeag commented on code in PR #40677:
URL: https://github.com/apache/superset/pull/40677#discussion_r3352412654



##########
superset/db_engine_specs/base.py:
##########
@@ -1304,6 +1304,38 @@ def get_datatype(cls, type_code: Any) -> str | None:
             return type_code.upper()
         return None
 
+    @classmethod
+    def normalize_column_values(cls, col_values: list[Any]) -> list[Any]:
+        """
+        Engine-specific hook to normalize column values before PyArrow 
conversion.
+
+        Called when the initial pa.array() conversion raises an exception, 
giving
+        the engine a chance to clean up values (e.g. replace sentinel strings 
with
+        None) before a second conversion attempt.
+
+        :param col_values: Raw Python values for one column
+        :return: Normalized values; return the input list unchanged by default
+        """
+        return col_values
+
+    @classmethod
+    def resolve_column_type(
+        cls, cursor_type: str | None, pa_mapped: str | None
+    ) -> str | None:
+        """
+        Choose the reported column type from the cursor description type and 
the
+        type inferred by PyArrow.
+
+        The default prefers the cursor description when available.  Override in
+        engine specs where the cursor description is unreliable (e.g. pydruid
+        infers STRING from a None or special-float first row value).
+
+        :param cursor_type: Type string from the cursor description, or None
+        :param pa_mapped: Type string inferred by PyArrow, or None
+        :return: The type string to report for this column
+        """
+        return cursor_type or pa_mapped

Review Comment:
   The methods are already tested indirectly via 
test_base_spec_ieee_special_floats_stringified and 
test_base_spec_none_first_value_reports_string_type in test_druid.py, but we 
added direct unit tests to satisfy it.



##########
superset/db_engine_specs/base.py:
##########
@@ -1304,6 +1304,38 @@ def get_datatype(cls, type_code: Any) -> str | None:
             return type_code.upper()
         return None
 
+    @classmethod
+    def normalize_column_values(cls, col_values: list[Any]) -> list[Any]:
+        """
+        Engine-specific hook to normalize column values before PyArrow 
conversion.
+
+        Called when the initial pa.array() conversion raises an exception, 
giving
+        the engine a chance to clean up values (e.g. replace sentinel strings 
with
+        None) before a second conversion attempt.
+
+        :param col_values: Raw Python values for one column
+        :return: Normalized values; return the input list unchanged by default
+        """
+        return col_values
+
+    @classmethod
+    def resolve_column_type(
+        cls, cursor_type: str | None, pa_mapped: str | None
+    ) -> str | None:
+        """
+        Choose the reported column type from the cursor description type and 
the
+        type inferred by PyArrow.
+
+        The default prefers the cursor description when available.  Override in
+        engine specs where the cursor description is unreliable (e.g. pydruid
+        infers STRING from a None or special-float first row value).
+
+        :param cursor_type: Type string from the cursor description, or None
+        :param pa_mapped: Type string inferred by PyArrow, or None
+        :return: The type string to report for this column
+        """
+        return cursor_type or pa_mapped

Review Comment:
   The methods are already tested indirectly via 
test_base_spec_ieee_special_floats_stringified and 
test_base_spec_none_first_value_reports_string_type in test_druid.py, but we 
added direct unit tests to satisfy this 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(dataset): preserve numeric column types when pydruid infers STRING from first-row value [superset]

Reply via email to