parthchandra opened a new issue, #4573:
URL: https://github.com/apache/datafusion-comet/issues/4573
### Describe the bug
## Expressions Missing Collation Guards
Tracked in
[#2190](https://github.com/apache/datafusion-comet/issues/2190). Currently
unreachable because the scan-level guard (`CometScanRule.scala:731`) forces
full fallback for any collated column. These become reachable if the scan guard
is relaxed.
### Collation-sensitive (need guards)
- [ ] `Upper` / `Lower` — `strings.scala:54-76` (partially mitigated by
`COMET_CASE_CONVERSION_ENABLED` gate)
- [ ] `InitCap` — `strings.scala:87`
- [ ] `Like` — `strings.scala:246`
- [ ] `RLike` — `strings.scala:264`
- [ ] `ConcatWs` — `strings.scala:225`
- [ ] `Contains` — `QueryPlanSerde.scala:165` (generic
`CometScalarFunction`)
- [ ] `StartsWith` — `QueryPlanSerde.scala:176` (generic
`CometScalarFunction`)
- [ ] `EndsWith` — `QueryPlanSerde.scala:166` (generic
`CometScalarFunction`)
- [ ] `StringInstr` — `QueryPlanSerde.scala:177` (generic
`CometScalarFunction`)
- [ ] `StringReplace` — `QueryPlanSerde.scala:179` (generic
`CometScalarFunction`)
- [ ] `ArrayJoin` — `arrays.scala:377`
### Likely safe (verify before closing)
- [ ] `StringRepeat` — `strings.scala:34` (repeating bytes preserves
collation)
- [ ] `Substring` / `Left` / `Right` — `strings.scala:106,132,166` (byte
slicing)
- [ ] `StringLPad` / `StringRPad` — `strings.scala:296,325` (padding, but
comparison unaffected)
- [ ] `RegExpReplace` — `strings.scala:353` (regex on raw bytes)
- [ ] `StringSplit` — `strings.scala:402` (splitting on raw bytes)
- [ ] `GetJsonObject` — `strings.scala:428` (JSON extraction)
### Safe (no guard needed)
- [x] `Length` — `strings.scala:78` (char count independent of collation)
- [x] `Ascii` — `QueryPlanSerde.scala` (first byte value)
- [x] `BitLength` / `OctetLength` — `QueryPlanSerde.scala` (byte-level
metric)
- [x] `Chr` — `QueryPlanSerde.scala` (int → char, no comparison)
### Already guarded
- [x] `Concat` — `strings.scala`
([#4567](https://github.com/apache/datafusion-comet/pull/4567))
- [x] `Reverse` — `collectionOperations.scala`
([#4567](https://github.com/apache/datafusion-comet/pull/4567))
- [x] `ArrayIntersect` — `arrays.scala:208`
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]