pepijnve commented on PR #18322:
URL: https://github.com/apache/datafusion/pull/18322#issuecomment-3557011660
Sorry for reviving this one. We had a need for `bit_count` in our codebase
as well, so I was looking at the Spark implementation. This test seems really
bizarre:
```
#[test]
fn test_bit_count_int32() {
// Test bit_count on Int32Array
let result =
spark_bit_count(&[Arc::new(Int32Array::from(vec![0i32, 1, 255, 1023,
-1]))])
.unwrap();
let arr = result.as_primitive::<Int32Type>();
...
assert_eq!(arr.value(4), 64); // -1 in two's complement = all 32 bits set
}
```
Counting the bits of a 32-bit value and getting back 64 is not what you
would expect at all. Is that really how Spark works? It's a bit underspecced
(or at least ambiguous) in the [Spark
documentation](https://spark.apache.org/docs/latest/api/sql/index.html#bit_count).
Just for comparison, here's DuckDB's answer to the same question
```
D select bit_count(cast(-1 as int));
┌────────────────────────────────┐
│ bit_count(CAST(-1 AS INTEGER)) │
│ int8 │
├────────────────────────────────┤
│ 32 │
└────────────────────────────────┘
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]