andygrove opened a new issue, #4462:
URL: https://github.com/apache/datafusion-comet/issues/4462
## Describe the bug
`CometStringRepeat` delegates to DataFusion `repeat`. DataFusion's `repeat`
throws on negative `n`, while Spark's `UTF8String.repeat` returns the empty
string for `n <= 0`. Comet currently reports `Compatible` for this expression
(with a `getCompatibleNotes` caveat), so users with `repeat(s, -1)` get a
runtime exception under Comet instead of the empty string Spark would produce.
Surfaced by the string-expressions audit in apache/datafusion-comet#4461.
## Steps to reproduce
```sql
SELECT repeat('abc', -1);
```
Spark: returns `''`.
Comet: throws `ArrowError("Invalid argument error: repeat requires a
non-negative number of repetitions")` at execution.
## Expected behavior
Either match Spark by returning `''`, or promote `CometStringRepeat` to
`Incompatible(Some(...))` so the path falls back unless explicitly enabled via
`spark.comet.expression.StringRepeat.allowIncompatible=true`.
## Additional context
- Comet serde: `spark/src/main/scala/org/apache/comet/serde/strings.scala`
(`CometStringRepeat`)
- Spark reference: `UTF8String.repeat(n)` short-circuits for `n <= 0`
- The current `getCompatibleNotes` text mentions the divergence but the
support level is still `Compatible`, so the path is taken silently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]