Sean-Kenneth-Doherty opened a new pull request, #22295: URL: https://github.com/apache/datafusion/pull/22295
## Which issue does this PR close? - Closes #22219. ## Rationale for this change `array_repeat` can panic for list inputs when it precomputes the repeated inner value count with `len * count`. A very large count can overflow that multiplication before DataFusion has a chance to return a normal execution error. ## What changes are included in this PR? - Adds checked arithmetic while precomputing the list-path output sizes in `array_repeat`. - Validates the computed outer and inner offsets against the output offset type before allocating builders. - Uses fallible vector reservation for list-path capacity hints so capacity overflow becomes an execution error instead of a panic. - Adds a Rust unit regression and an SQL logic regression for `array_repeat([1, 2, 3], 9223372036854775807)`. Scope note: this intentionally targets the list-input path from #22219. The scalar element path is separate from this issue. ## Are these changes tested? Yes. - `cargo test -p datafusion-functions-nested list_repeat_rejects_inner_count_overflow` - `cargo test -p datafusion-sqllogictest --test sqllogictests -- array/array_repeat.slt` - `cargo test -p datafusion-functions-nested` - `cargo fmt --check` - `cargo clippy -p datafusion-functions-nested --all-targets -- -D warnings` - `git diff --check` ## Are there any user-facing changes? Yes. A malformed/oversized `array_repeat` query now returns a DataFusion execution error instead of panicking the process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
