andygrove opened a new issue, #3176:
URL: https://github.com/apache/datafusion-comet/issues/3176
## Summary
`array_repeat` is marked as `Incompatible` in Comet, but the specific
incompatibility is not documented. This issue tracks documenting and
potentially fixing the behavior difference.
## Spark Specification
According to Spark's `array_repeat` behavior:
- Returns an array with the element repeated `count` times
- **Negative counts are treated as 0**, returning an empty array
- Returns null if count is null
Examples:
```sql
SELECT array_repeat('hello', 3);
-- Spark returns: ["hello", "hello", "hello"]
SELECT array_repeat('test', 0);
-- Spark returns: []
SELECT array_repeat('item', -1);
-- Spark returns: [] (negative count treated as 0)
SELECT array_repeat('test', null);
-- Spark returns: null
```
## Current Comet Behavior
Comet uses DataFusion's `array_repeat` function. The specific behavior for
negative counts may differ:
- DataFusion may throw an error for negative counts
- Or DataFusion may have different behavior
## Tests
The test suite includes:
```scala
checkSparkAnswerAndOperator(sql("SELECT array_repeat(_4, 0) from t1"))
```
But negative count tests are not visible in the current test file.
## Possible Solutions
1. **Verify actual behavior** - test `array_repeat(x, -1)` in both Spark and
Comet
2. **Pre-processing** - wrap the count with `GREATEST(count, 0)` to treat
negative as 0
3. **Custom Rust implementation** that handles negative counts like Spark
---
> **Note:** This issue was generated with AI assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]