DDtKey commented on issue #7931:
URL:
https://github.com/apache/arrow-datafusion/issues/7931#issuecomment-1871601997
Another (a bit artificial, intentionally generating a Cartesian product) MRE:
```
❯ CREATE EXTERNAL TABLE test STORED AS CSV LOCATION
'file:///tmp/test-21mb.csv';
0 rows in set. Query took 0.105 seconds.
❯ SELECT repeat(lf1.column_2, 15) FROM test lf1 join test lf2 on
lf1.column_1 = lf2.column_1 join test lf3 on lf2.column_1 = lf3.column_1 join
test lf4 on lf3.column_1 = lf4.column_1
;
thread 'tokio-runtime-worker' panicked at
/Users/brew/Library/Caches/Homebrew/cargo_cache/registry/src/index.crates.io-6f17d22bba15001f/arrow-select-49.0.0/src/take.rs:423:41:
offset overflow
```
<details>
<summary>stack backtrace</summary>
```
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::option::expect_failed
3: arrow_select::take::take_bytes
4: arrow_select::take::take_impl
5: arrow_select::take::take
6: datafusion_physical_plan::joins::utils::build_batch_from_indices
7: <datafusion_physical_plan::joins::hash_join::HashJoinStream as
futures_core::stream::Stream>::poll_next
8: <datafusion_physical_plan::coalesce_batches::CoalesceBatchesStream as
futures_core::stream::Stream>::poll_next
9: <datafusion_physical_plan::projection::ProjectionStream as
futures_core::stream::Stream>::poll_next
10:
datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
11: tokio::runtime::task::core::Core<T,S>::poll
12: tokio::runtime::task::harness::Harness<T,S>::poll
13: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
14: tokio::runtime::scheduler::multi_thread::worker::Context::run
15: tokio::runtime::context::runtime::enter_runtime
16: tokio::runtime::scheduler::multi_thread::worker::run
17: tokio::runtime::task::core::Core<T,S>::poll
18: tokio::runtime::task::harness::Harness<T,S>::poll
19: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
```
</details>
> query output tries to make a column with more than 2GB
In fact, it's not the case for this example, the query result does not
generate such large data (but intermediate results can be really big - I mean
row, not a column)
Not sure if it may help though
test file:
[test-21mb.csv](https://github.com/apache/arrow-datafusion/files/13789894/test-21mb.csv)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]