andygrove opened a new issue, #3175:
URL: https://github.com/apache/datafusion-comet/issues/3175

   ## Summary
   
   `arrays_overlap` is marked as `Incompatible` in Comet, but the specific 
incompatibility is not documented. This issue tracks documenting and 
potentially fixing the behavior difference.
   
   ## Spark Specification
   
   According to Spark's `arrays_overlap` behavior:
   - Returns `true` if at least one element exists in both arrays
   - Returns `false` if no common elements are found AND no null elements exist
   - Returns `null` if no common elements are found BUT null elements exist in 
either array (three-valued logic)
   
   Examples:
   ```sql
   SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
   -- Spark returns: true
   
   SELECT arrays_overlap(array(1, 2), array(3, 4));  
   -- Spark returns: false
   
   SELECT arrays_overlap(array(1, null, 3), array(4, 5));
   -- Spark returns: null (because null element exists, result is indeterminate)
   
   SELECT arrays_overlap(array(1, null, 3), array(1, 4));
   -- Spark returns: true (found common element 1)
   ```
   
   ## Current Comet Behavior
   
   Comet uses DataFusion's `array_has_any` function. The specific null handling 
behavior may differ:
   - DataFusion may return `false` instead of `null` when no overlap is found 
but nulls exist
   
   ## Current Tests
   
   Looking at `CometArrayExpressionSuite.scala`:
   ```scala
   checkSparkAnswerAndOperator(sql(
     "SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where 
_1 is not null"))
   ```
   
   Tests exist but the expression is marked as `Incompatible`, requiring 
`allow_incompatible=true` to run.
   
   ## Possible Solutions
   
   1. **Verify actual behavior difference** - run specific test cases comparing 
Spark vs Comet
   2. **Custom Rust implementation** if DataFusion doesn't match Spark's 
three-valued null logic
   3. **Post-processing** - wrap result to check for null elements and convert 
false to null
   
   ---
   
   > **Note:** This issue was generated with AI assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to