alamb commented on issue #10073: URL: https://github.com/apache/arrow-datafusion/issues/10073#issuecomment-2056571501
It sounds like the issue, at a (really) high level is "additional buffer space is required to actually implement the spill" And since during spill the plan is under memory pressure, getting this additional memory can and does fail Some strategies I can think of are: 1. Simply turn off the memory accounting of intermediate results (String batches in your example) above during the spilling process (pro: simpler to implement I think, con: overshoots limits) 2. Reserve additional buffer space up front to be used during spill (e.g. set aside 50MB). (pro: won't overshoot, cons: not clear how much is "enough" and will reduce amount of memory that can be reserved 3. Reduce the memory required for intermediate spilling (e.g. maybe use a batch size 1/2 the size) Maybe we can do 1 in the sort term while figuring out a more sophisticated strategy for 2 or 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
