mbutrovich opened a new issue, #22602: URL: https://github.com/apache/datafusion/issues/22602
### Describe the bug `SparkWidthBucket::return_type` returns `Int32`, but Spark's `WidthBucket.dataType` is `LongType`: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1882 ```rust // datafusion/spark/src/function/math/width_bucket.rs fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> { Ok(Int32) } ``` The `n_bucket` input was aligned to `i64` to match Spark in #20330, but the return type was left as `Int32`. The kernel still builds `Int32Array`. This produces wrong results in any consumer that plans against Spark's declared output type (`Int64`) but receives an `Int32Array` at runtime: with two rows per batch, the consumer reads 16 bytes of `Int64` from an 8-byte `Int32` buffer, packing two int32 values into a single int64 and reading uninitialized bytes for the rest. Concretely, for `width_bucket(value, 0.0, 10.0, 5)` over `Range(0, 10)` split into 5 partitions of 2 rows each: | value | expected (Int64) | observed | |---|---|---| | 0 | 1 | 4294967297 (= 0x1_00000001) | | 1 | 1 | 0 | | 2 | 2 | 8589934594 (= 0x2_00000002) | | 3 | 2 | 0 | | ... | ... | ... | ### To Reproduce Run any consumer that respects Spark's declared `LongType` for `WidthBucket` against `SparkWidthBucket`. Reproduces in DataFusion Comet on the `width_bucket - with range data` test in `CometMathExpressionSuite` (https://github.com/apache/datafusion-comet/issues/4347). ### Expected behavior `SparkWidthBucket::return_type` returns `Int64` and the kernel builds `Int64Array`, matching Spark. ### Additional context Related: #20330 (input parameter alignment). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
