pepijnve commented on PR #18322:
URL: https://github.com/apache/datafusion/pull/18322#issuecomment-3557011660

   Sorry for reviving this one. We had a need for `bit_count` in our codebase 
as well, so I was looking at the Spark implementation. This test seems really 
bizarre:
   
   ```
   #[test]
   fn test_bit_count_int32() {
       // Test bit_count on Int32Array
       let result =
           spark_bit_count(&[Arc::new(Int32Array::from(vec![0i32, 1, 255, 1023, 
-1]))])
               .unwrap();
   
       let arr = result.as_primitive::<Int32Type>();
   ...
       assert_eq!(arr.value(4), 64); // -1 in two's complement = all 32 bits set
   }
   ```
   
   Counting the bits of a 32-bit value and getting back 64 is not what you 
would expect at all. Is that really how Spark works? It's a bit underspecced 
(or at least ambiguous) in the [Spark 
documentation](https://spark.apache.org/docs/latest/api/sql/index.html#bit_count).
 
   
   Just for comparison, here's DuckDB's answer to the same question
   ```
   D select bit_count(cast(-1 as int));
   ┌────────────────────────────────┐
   │ bit_count(CAST(-1 AS INTEGER)) │
   │              int8              │
   ├────────────────────────────────┤
   │               32               │
   └────────────────────────────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to