andygrove opened a new issue, #4681: URL: https://github.com/apache/datafusion-comet/issues/4681
### Describe the bug The array expression audit (#4483) noted that `array_union` result ordering versus DataFusion is unverified. Spark's `array_union` preserves left-array elements first, followed by elements from the right array that are not already present. `CometArrayUnion` (`spark/src/main/scala/org/apache/comet/serde/arrays.scala`) is currently registered as `Compatible`, so if DataFusion's `array_union` does not preserve that ordering, Comet returns rows in a different element order than Spark. This is the ordering analog of the `array_intersect` ordering caveat already documented, and is separate from the NaN canonicalization divergence tracked in #4481. ### Steps to reproduce Compare `array_union(a, b)` element ordering between Comet and Spark for inputs where the union introduces new elements from the right array, and where the left array contains duplicates. ### Expected behavior Verify whether DataFusion's `array_union` preserves Spark's left-first-then-new-right-elements ordering. If it does, no change is needed and this can be closed. If it does not, raise `CometArrayUnion`'s support level to `Incompatible(Some(...))` with a documented reason, matching the `array_intersect` treatment. ### Additional context Split out from #4503 (item 5), surfaced by the `audit-comet-expression` skill run in #4483. Distinct from the closed `array_union` correctness issue #3644. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
