samueleresca opened a new issue, #17857: URL: https://github.com/apache/datafusion/issues/17857
### Describe the bug It is possible to cause a panic in Datafusion on 64-bit machines. Datafusion does not handle the panic caused by the underlying `append_value` method in the `GenericByteViewBuilder`. (See [affected line](https://github.com/apache/arrow-rs/blob/b9c2bf73e792e7cb849f0bd453059ceef45b0b74/arrow-array/src/builder/generic_bytes_view_builder.rs#L310)) See the To Reproduce section for the recursive concat query. Few notes/thoughts: - The panic is reproducible in the latest version of Data Fusion, but it was also there in the previous versions. - The panic originates in `arrow-rs` as on 64-bit systems, `usize` can be up to `u64::MAX`, but everything is assumed to be `u32::MAX`. The panic is already declared in the doc of the `append_value` method, but never handled by the consumer (DataFusion). Is there a reason for that? - I understand the panic happens in `arrow-rs`, but should datafusion handle the panic coming from Arrow? (e.g. in the `append_value` call of `concat_elements_utf8view`) to prevent the panic from happening - Should datafusion limit somehow the `append_value` calls to prevent the panic from happening? cc @comphead ### To Reproduce Sample repository: https://github.com/samueleresca/datafusion-byte-view-builder-issue 1. Run on a 64-bit machine. 2. Include some dummy data (I attached an example): ``` ctx.register_parquet("users","./data/users_shorten.parquet", ParquetReadOptions::default()).await?; ``` 3. Run a recursive string concatenation query (see query in `main.rs`) 4. Observe the panic ### Expected behavior - Should Data Fusion handle the panic? - (maybe) restrictions on the builder view calls from datafusion - (maybe) arrow-rs handling this more gracefully ### Additional context Truncated panic trace happening on my local machine: ``` thread 'tokio-runtime-worker' (38043518) panicked at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46: called `Result::unwrap()` on an `Err` value: TryFromIntError(()) stack backtrace: 0: __rustc::rust_begin_unwind at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/std/src/panicking.rs:698:5 1: core::panicking::panic_fmt at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/panicking.rs:75:14 2: core::result::unwrap_failed at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/result.rs:1855:5 3: core::result::Result<T,E>::unwrap at /Users/samuele/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1226:23 4: arrow_array::builder::generic_bytes_view_builder::GenericByteViewBuilder<T>::append_value at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46 5: datafusion_physical_expr::expressions::binary::kernels::concat_elements_utf8view at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary/kernels.rs:159:20 6: datafusion_physical_expr::expressions::binary::concat_elements at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:1078:40 7: datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_with_resolved_args at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:848:29 8: <datafusion_physical_expr::expressions::binary::BinaryExpr as datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:479:14 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
