andygrove opened a new issue, #3175:
URL: https://github.com/apache/datafusion-comet/issues/3175
## Summary
`arrays_overlap` is marked as `Incompatible` in Comet, but the specific
incompatibility is not documented. This issue tracks documenting and
potentially fixing the behavior difference.
## Spark Specification
According to Spark's `arrays_overlap` behavior:
- Returns `true` if at least one element exists in both arrays
- Returns `false` if no common elements are found AND no null elements exist
- Returns `null` if no common elements are found BUT null elements exist in
either array (three-valued logic)
Examples:
```sql
SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
-- Spark returns: true
SELECT arrays_overlap(array(1, 2), array(3, 4));
-- Spark returns: false
SELECT arrays_overlap(array(1, null, 3), array(4, 5));
-- Spark returns: null (because null element exists, result is indeterminate)
SELECT arrays_overlap(array(1, null, 3), array(1, 4));
-- Spark returns: true (found common element 1)
```
## Current Comet Behavior
Comet uses DataFusion's `array_has_any` function. The specific null handling
behavior may differ:
- DataFusion may return `false` instead of `null` when no overlap is found
but nulls exist
## Current Tests
Looking at `CometArrayExpressionSuite.scala`:
```scala
checkSparkAnswerAndOperator(sql(
"SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where
_1 is not null"))
```
Tests exist but the expression is marked as `Incompatible`, requiring
`allow_incompatible=true` to run.
## Possible Solutions
1. **Verify actual behavior difference** - run specific test cases comparing
Spark vs Comet
2. **Custom Rust implementation** if DataFusion doesn't match Spark's
three-valued null logic
3. **Post-processing** - wrap result to check for null elements and convert
false to null
---
> **Note:** This issue was generated with AI assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]