harshitsaini17 opened a new pull request, #19195:
URL: https://github.com/apache/datafusion/pull/19195

   ## Which issue does this PR close?
   
   Closes #19146
   
   Part of #19144 (EPIC: fix nullability report for spark expression)
   
   ## Rationale for this change
   
   The `bitmap_count` UDF was using the default `return_type` implementation 
which does not preserve nullability information. This causes:
   
   1. **Incorrect schema inference** - non-nullable binary inputs are 
incorrectly marked as producing nullable Int64 outputs
   2. **Missed optimization opportunities** - the query optimizer cannot apply 
nullability-based optimizations when metadata is incorrect
   3. **Inconsistent behavior** - other similar functions preserve nullability, 
leading to unexpected differences
   
   The `bitmap_count` function counts set bits in a binary input and returns an 
Int64. The operation itself doesn't introduce nullability - if the input is 
non-nullable, the output will always be non-nullable. Therefore, the output 
nullability should match the input.
   
   ## What changes are included in this PR?
   
   1. **Implemented `return_field_from_args`**: Creates a field with Int64 type 
and the same nullability as the input field
   2. **Updated `return_type`**: Now returns an error directing users to use 
`return_field_from_args` instead (following DataFusion best practices)
   3. **Added `FieldRef` import** to support returning field metadata
   4. **Added nullability test**: Verifies that both nullable and non-nullable 
inputs are handled correctly
   
   ## Are these changes tested?
   
   Yes, this PR includes a new test `test_bitmap_count_nullability` that 
verifies:
   - Non-nullable Binary input produces non-nullable Int64 output
   - Nullable Binary input produces nullable Int64 output
   - Data type is correctly set to Int64 in both cases
   
   Test results:
   ```
   running 1 test
   test function::bitmap::bitmap_count::tests::test_bitmap_count_nullability 
... ok
   
   test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
   ```
   
   Additionally, all existing `bitmap_count` tests continue to pass, ensuring 
backward compatibility.
   
   ## Are there any user-facing changes?
   
   **Yes - Schema metadata improvement:**
   
   Users will now see correct nullability information in the schema:
   
   **Before (Bug):**
   - Non-nullable binary input → nullable Int64 output  (incorrect)
   
   **After (Fixed):**
   - Non-nullable binary input → non-nullable Int64 output  (correct)
   - Nullable binary input → nullable Int64 output  (correct)
   
   This is a **bug fix** that corrects schema metadata only - it does not 
change the actual computation or introduce any breaking changes to the API.
   
   **Impact:**
   - Query optimizers can now make better decisions based on accurate 
nullability information
   - Schema validation will be more accurate
   - No changes to function behavior or output values
   
   ---
   
   ## Code Changes Summary
   
   ### Modified File: `datafusion/spark/src/function/bitmap/bitmap_count.rs`
   
   #### 1. Added Import
   ```rust
   use arrow::datatypes::{DataType, FieldRef, Int16Type, Int32Type, Int64Type, 
Int8Type};
   ```
   
   #### 2. Updated return_type Method
   ```rust
   fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
       internal_err!("return_field_from_args should be used instead")
   }
   ```
   
   #### 3. Added return_field_from_args Implementation
   ```rust
   fn return_field_from_args(
       &self,
       args: datafusion_expr::ReturnFieldArgs,
   ) -> Result<FieldRef> {
       use arrow::datatypes::Field;
       // bitmap_count returns Int64 with the same nullability as the input
       Ok(Arc::new(Field::new(
           args.arg_fields[0].name(),
           DataType::Int64,
           args.arg_fields[0].is_nullable(),
       )))
   }
   ```
   
   #### 4. Added Test
   ```rust
   #[test]
   fn test_bitmap_count_nullability() -> Result<()> {
       use datafusion_expr::ReturnFieldArgs;
   
       let bitmap_count = BitmapCount::new();
   
       // Test with non-nullable binary field
       let non_nullable_field = Arc::new(Field::new("bin", DataType::Binary, 
false));
   
       let result = bitmap_count.return_field_from_args(ReturnFieldArgs {
           arg_fields: &[Arc::clone(&non_nullable_field)],
           scalar_arguments: &[None],
       })?;
   
       // The result should not be nullable (same as input)
       assert!(!result.is_nullable());
       assert_eq!(result.data_type(), &Int64);
   
       // Test with nullable binary field
       let nullable_field = Arc::new(Field::new("bin", DataType::Binary, true));
   
       let result = bitmap_count.return_field_from_args(ReturnFieldArgs {
           arg_fields: &[Arc::clone(&nullable_field)],
           scalar_arguments: &[None],
       })?;
   
       // The result should be nullable (same as input)
       assert!(result.is_nullable());
       assert_eq!(result.data_type(), &Int64);
   
       Ok(())
   }
   ```
   
   ---
   
   ## Verification Steps
   
   1. **Run the new test:**
      ```bash
      cargo test -p datafusion-spark test_bitmap_count_nullability --lib
      ```
   
   2. **Run all bitmap_count tests:**
      ```bash
      cargo test -p datafusion-spark bitmap_count --lib
      ```
   
   3. **Run clippy checks:**
      ```bash
      cargo clippy -p datafusion-spark --all-targets -- -D warnings
      ```
   
   All checks pass successfully! 
   
   ---
   
   ## Related Issues
   
   - Closes: #19146
   - Part of EPIC: #19144 (fix nullability report for spark expression)
   - Similar fix: #19145 (shuffle function nullability)
   
   ---
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to