andygrove commented on PR #4726:
URL:
https://github.com/apache/datafusion-comet/pull/4726#issuecomment-4810139987
Nice fix. The short-circuit logic matches Spark's `ArrayInsert.eval`
exactly, and evaluating `src` before `pos` also lines up error precedence with
Spark when more than one argument can throw.
One suggestion on test coverage. The new `test_array_insert_short_circuit`
case nicely exercises skipping `pos` (and `item`) when `src` is `NULL`. There's
a second, distinct path that the new `&& pos_value.is_valid(row)` guard on
`evaluate_item` protects: `src` non-null, `pos` `NULL`, and an `item` that
would throw if evaluated. Spark short-circuits `item` there too (the inner `if
(value2 != null)` check), so it returns `NULL` rather than raising. It might be
worth pinning that down with something like:
```sql
statement
INSERT INTO test_array_insert_short_circuit VALUES
(NULL, 0),
(array(1), 1),
(array(1), 2)
query
SELECT array_insert(arr, CAST(NULL AS INT), element_at(array(1), idx))
FROM test_array_insert_short_circuit
```
Here `pos` is `NULL` for every row, so `item = element_at(array(1), idx)`
should never be evaluated, including the `idx = 0` and `idx = 2` rows that
would otherwise throw. Without the `pos` validity check on `evaluate_item` this
would error, so it guards that branch directly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]