mbutrovich opened a new issue, #22602:
URL: https://github.com/apache/datafusion/issues/22602

   ### Describe the bug
   
   `SparkWidthBucket::return_type` returns `Int32`, but Spark's 
`WidthBucket.dataType` is `LongType`:
   
   
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1882
   
   ```rust
   // datafusion/spark/src/function/math/width_bucket.rs
   fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
       Ok(Int32)
   }
   ```
   
   The `n_bucket` input was aligned to `i64` to match Spark in #20330, but the 
return type was left as `Int32`. The kernel still builds `Int32Array`.
   
   This produces wrong results in any consumer that plans against Spark's 
declared output type (`Int64`) but receives an `Int32Array` at runtime: with 
two rows per batch, the consumer reads 16 bytes of `Int64` from an 8-byte 
`Int32` buffer, packing two int32 values into a single int64 and reading 
uninitialized bytes for the rest.
   
   Concretely, for `width_bucket(value, 0.0, 10.0, 5)` over `Range(0, 10)` 
split into 5 partitions of 2 rows each:
   
   | value | expected (Int64) | observed |
   |---|---|---|
   | 0 | 1 | 4294967297 (= 0x1_00000001) |
   | 1 | 1 | 0 |
   | 2 | 2 | 8589934594 (= 0x2_00000002) |
   | 3 | 2 | 0 |
   | ... | ... | ... |
   
   ### To Reproduce
   
   Run any consumer that respects Spark's declared `LongType` for `WidthBucket` 
against `SparkWidthBucket`. Reproduces in DataFusion Comet on the `width_bucket 
- with range data` test in `CometMathExpressionSuite` 
(https://github.com/apache/datafusion-comet/issues/4347).
   
   ### Expected behavior
   
   `SparkWidthBucket::return_type` returns `Int64` and the kernel builds 
`Int64Array`, matching Spark.
   
   ### Additional context
   
   Related: #20330 (input parameter alignment).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to