yjshen commented on issue #9417:
URL:
https://github.com/apache/arrow-datafusion/issues/9417#issuecomment-1974948936
I see the main reason causing resource exhaustion is the incorrect memory
accounting for record batches stored in TopK's RecordBatchStore (as noted in
the issue description, it's ~220MB per batch). By printing the mem size
calculation a little bit, I saw:
```
Getting mem size of batch in topk::insert with batch size: 8192
Column 0 mem: 37561184
Column 1 mem: 37561184
Column 2 mem: 78416312
Column 3 mem: 72507488
Inserting batch with mem size: 226046168
```
If we correct the calculation, spill to disk for TopK would be less of a
concern.
And for option3, there is `maybe_compact` in TopK serving a similar
purpose, but still keeps relevant records in record batch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]