westonpace commented on PR #41335:
URL: https://github.com/apache/arrow/pull/41335#issuecomment-2088673206

   From the PR it mentions the max value for this variable should be 64MB.  
   
   > Sorry I'm still confused. From the discussion in the email thread, it 
seems like the intention at that time was making the temp stack thread-local 
and inter-node, i.e., sharing the same temp stack instance among all the nodes 
being executed in a specific thread (sometimes referred to as a "pipeline"). 
The "unfinished refactoring" back then might be making more kinds of nodes to 
use the shared temp stack rather than their own.
   
   This is correct.  At the time our (potentially flawed) belief was that the 
temp stack usage was fairly well bounded and so there didn't seem to be much 
advantage in every node needing to reinvent the wheel.  If the knowledge of how 
much temp stack space is needed is local to specific nodes (and can vary widely 
so there is no reasonable default max) then per-node is correct.
   
   I probably missed this in the earlier fixes but do we actually know why the 
temp stack is overflowing?  Is there some reason to think that a swiss join 
node can predict how much temp stack space it needs?  Otherwise I'm not sure 
why moving the temp-stack to a per-node basis fixes the root cause.
   
   > It's somewhat odd that temp vectors are associated with a QueryContext
   
   I believe the intention was to create something similar to malloc/new but 
faster (by taking advantage of pre-allocated thread local space).
   
   > this means that [join nodes within a 
query](https://github.com/stenlarsson/arrow-test/blob/main/hang.py) are sharing 
their statically allocated space
   
   This may be true since node A calls node B calls node C, etc.  However, note 
that the stack itself would behave in the same way in this situation (stack 
frame A is followed by stack frame B followed by stack frame C).
   
   If this nesting is somehow unbounded (potentially through async recursion) 
then you will get a stack overflow at some point, regardless of the size of the 
temp local stack.
   
   I think Antoine's suggestion works too (if I understand it correctly) which 
is just to allocate a new slab if there isn't space available in the initial 
slab.  The extra slab can be deallocated when the stack shrinks enough that it 
is no longer needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to