Re: [I] join statement causes panic [arrow-datafusion]

via GitHub Thu, 28 Dec 2023 14:41:19 -0800


DDtKey commented on issue #7931:
URL: 
https://github.com/apache/arrow-datafusion/issues/7931#issuecomment-1871601997


   Another (a bit artificial, intentionally generating a Cartesian product) MRE:
   
   ```
   ❯ CREATE EXTERNAL TABLE test STORED AS CSV LOCATION 
'file:///tmp/test-21mb.csv';
   0 rows in set. Query took 0.105 seconds.
   
   ❯ SELECT repeat(lf1.column_2, 15) FROM test lf1 join test lf2 on 
lf1.column_1 = lf2.column_1 join test lf3 on lf2.column_1 = lf3.column_1 join 
test lf4 on lf3.column_1 = lf4.column_1
   ;
   thread 'tokio-runtime-worker' panicked at 
/Users/brew/Library/Caches/Homebrew/cargo_cache/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-49.0.0/src/take.rs:423:41:
   offset overflow
   ```
   
   <details>
   <summary>stack backtrace</summary>
   
   ```
      0: _rust_begin_unwind
      1: core::panicking::panic_fmt
      2: core::option::expect_failed
      3: arrow_select::take::take_bytes
      4: arrow_select::take::take_impl
      5: arrow_select::take::take
      6: datafusion_physical_plan::joins::utils::build_batch_from_indices
      7: <datafusion_physical_plan::joins::hash_join::HashJoinStream as 
futures_core::stream::Stream>::poll_next
      8: <datafusion_physical_plan::coalesce_batches::CoalesceBatchesStream as 
futures_core::stream::Stream>::poll_next
      9: <datafusion_physical_plan::projection::ProjectionStream as 
futures_core::stream::Stream>::poll_next
     10: 
datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
     11: tokio::runtime::task::core::Core<T,S>::poll
     12: tokio::runtime::task::harness::Harness<T,S>::poll
     13: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
     14: tokio::runtime::scheduler::multi_thread::worker::Context::run
     15: tokio::runtime::context::runtime::enter_runtime
     16: tokio::runtime::scheduler::multi_thread::worker::run
     17: tokio::runtime::task::core::Core<T,S>::poll
     18: tokio::runtime::task::harness::Harness<T,S>::poll
     19: tokio::runtime::blocking::pool::Inner::run
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   ``` 
   </details>
   
   > query output tries to make a column with more than 2GB 
   
   In fact, it's not the case for this example, the query result does not 
generate such large data (but intermediate results can be really big - I mean 
row, not a column)
   Not sure if it may help though
   
   test file: 
[test-21mb.csv](https://github.com/apache/arrow-datafusion/files/13789894/test-21mb.csv)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] join statement causes panic [arrow-datafusion]

Reply via email to