andygrove commented on issue #4500: URL: https://github.com/apache/datafusion-comet/issues/4500#issuecomment-4835443276
**Item #3 (hex/unhex collation) — investigated, no change needed.** `Hex` output is restricted to the fixed ASCII alphabet `0-9A-F`, so its values are collation-invariant: identical hex strings compare equal and group identically under every collation, and the small uppercase alphabet is not reordered by realistic locale collations. `Unhex` returns `BinaryType`, which carries no collation at all. The only way collation could matter is a collation-aware operation *consuming* the hex result (e.g. `WHERE hex(x) = 'ab'`, `ORDER BY hex(x)`). Comet already declines to evaluate collation-aware string comparisons/sorts natively (`QueryPlanSerde.scala:1051`), so those fall back to Spark and produce correct results. The hex projection still runs natively. Net: Comet cannot produce a collation-related wrong answer for `hex`/`unhex` today, so they correctly stay `Compatible`. The original audit flag was a blanket-policy concern, not a defect. (Contrast `concat`/`reverse`, which are gated for the same theoretical reason and are arguably over-gated.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
