samueleresca opened a new issue, #17857:
URL: https://github.com/apache/datafusion/issues/17857

   ### Describe the bug
   
   It is possible to cause a panic in Datafusion on 64-bit machines. Datafusion 
does not handle the panic caused by the underlying `append_value` method in the 
`GenericByteViewBuilder`. (See [affected 
line](https://github.com/apache/arrow-rs/blob/b9c2bf73e792e7cb849f0bd453059ceef45b0b74/arrow-array/src/builder/generic_bytes_view_builder.rs#L310))
   
   See the To Reproduce section for the recursive concat query.
   
   Few notes/thoughts:
   - The panic is reproducible in the latest version of Data Fusion, but it was 
also there in the previous versions.
   - The panic originates in `arrow-rs` as on 64-bit systems, `usize` can be up 
to `u64::MAX`, but everything is assumed to be `u32::MAX`. The panic is already 
declared in the doc of the `append_value` method, but never handled by the 
consumer (DataFusion). Is there a reason for that?
   - I understand the panic happens in `arrow-rs`, but should datafusion handle 
the panic coming from Arrow? (e.g. in the `append_value` call of 
`concat_elements_utf8view`) to prevent the panic from happening
   - Should datafusion limit somehow the `append_value` calls to prevent the 
panic from happening?
   
   cc @comphead 
   
   ### To Reproduce
   
   Sample repository: 
https://github.com/samueleresca/datafusion-byte-view-builder-issue
   
   1. Run on a 64-bit machine.
   2. Include some dummy data (I attached an example):
   ```
    ctx.register_parquet("users","./data/users_shorten.parquet", 
ParquetReadOptions::default()).await?;
   ```
   3. Run a recursive string concatenation query (see query in `main.rs`)
   4. Observe the panic
   
   
   
   ### Expected behavior
   
   - Should Data Fusion handle the panic? 
   - (maybe) restrictions on the builder view calls from datafusion
   - (maybe) arrow-rs handling this more gracefully
   
   ### Additional context
   
   Truncated panic trace happening on my local machine:
   ```
   thread 'tokio-runtime-worker' (38043518) panicked at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46:
   called `Result::unwrap()` on an `Err` value: TryFromIntError(())
   stack backtrace:
      0: __rustc::rust_begin_unwind
                at 
/rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/std/src/panicking.rs:698:5
      1: core::panicking::panic_fmt
                at 
/rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/panicking.rs:75:14
      2: core::result::unwrap_failed
                at 
/rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/result.rs:1855:5
      3: core::result::Result<T,E>::unwrap
                at 
/Users/samuele/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1226:23
      4: 
arrow_array::builder::generic_bytes_view_builder::GenericByteViewBuilder<T>::append_value
                at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46
      5: 
datafusion_physical_expr::expressions::binary::kernels::concat_elements_utf8view
                at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary/kernels.rs:159:20
      6: datafusion_physical_expr::expressions::binary::concat_elements
                at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:1078:40
      7: 
datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_with_resolved_args
                at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:848:29
      8: <datafusion_physical_expr::expressions::binary::BinaryExpr as 
datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate
                at 
/Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:479:14
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to