madhushreeag opened a new pull request, #40677:
URL: https://github.com/apache/superset/pull/40677

   ### SUMMARY
   pydruid infers column types from the first row value of each column in a 
result set. This creates two related problems for numeric columns:
   
   1. IEEE special floats: Druid cannot represent NaN, Infinity, or -Infinity 
in JSON, so pydruid emits them as the strings "NaN", "Infinity", and 
"-Infinity". When one of these strings appears in an otherwise numeric column, 
pa.array() raises ArrowInvalid on the mixed str/float list and the entire 
column falls back to string serialisation — losing numeric type information and 
breaking aggregations.
   2. None-first-value columns: When the first row value is null, 
pydruid.get_type(None) returns Type.STRING, labelling the column as STRING in 
the cursor description. Even though PyArrow correctly infers float64 from the 
remaining rows (no exception is raised), SupersetResultSet.data_type() was 
returning "STRING" because cursor description always took precedence — causing 
Superset to treat legitimately numeric columns as strings.
   
   ## Fix (gated behind a new PRESERVE_NUMERIC_COLUMNS_FOR_SPECIAL_FLOATS 
feature flag, default False):
   1. In the pa.array() exception path, special-float strings are replaced with 
None before retrying PyArrow array construction, preserving the column as 
numeric with nulls in place of the special values.
   2. In data_type(), when the cursor description says STRING but PyArrow 
inferred a more specific type (INT, FLOAT, DATETIME), PyArrow's inference wins 
— fixing the metadata case where no exception is raised.
   
   The flag is off by default to avoid unintended type changes on existing 
deployments. Operators running Druid with nullable or special-float-heavy 
columns should enable it.
   
   
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to