alamb commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2656430639
> I have also encountered the same problem with string views. > > DataFusion uses `interleave` function to produce merged batches, and `interleave` tends to produce batches that has super large size due to [apache/arrow-rs#6779](https://github.com/apache/arrow-rs/pull/6779). Although it simply references to the data buffers of interleaved arrays so it does not actually take extra memory space, but it makes the result of `get_record_batch_memory_size(batch)` or `batch.get_array_memory_size()` super large, increasing the chance of getting memory reservation failures. > > When spilling happens, these interleaved arrays will be serialized using Arrow IPC and produces very large binaries. When we read them back in spill-read phase, we have to allocate super large buffers for these arrays, which makes things much worse. I think the fix for https://github.com/apache/arrow-rs/pull/6779 is in DataFusion 45 -- does this still happen? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org