alamb commented on issue #10073:
URL: 
https://github.com/apache/arrow-datafusion/issues/10073#issuecomment-2056571501

   It sounds like the issue, at a (really) high level is "additional buffer 
space is required to actually implement the spill" 
   
   And since during spill the plan is under memory pressure, getting this 
additional memory can and does fail
   
   Some strategies I can think of are:
   1. Simply turn off the memory accounting of intermediate results (String 
batches in your example) above during the spilling process (pro: simpler to 
implement I think, con: overshoots limits)
   2. Reserve additional buffer space up front to be used during spill (e.g. 
set aside 50MB). (pro: won't overshoot, cons: not clear how much is "enough" 
and will reduce amount of memory that can be reserved
   3. Reduce the memory required for intermediate spilling (e.g. maybe use a 
batch size 1/2 the size)
   
   Maybe we can do 1 in the sort term while figuring out a more sophisticated 
strategy for 2 or 3
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to