zanmato1984 commented on PR #41335: URL: https://github.com/apache/arrow/pull/41335#issuecomment-2088780641
> From the PR it mentions the max value for this variable should be 64MB. That seems confusing to me. Why wouldn't we just always use 64MB. Is there concern that 64MB per plan is too much RAM overhead? Yes, this is my assumption. And the overhead is actually 64MB per plan * **per thread**. > I probably missed this in the earlier fixes but do we actually know why the temp stack is overflowing? Is there some reason to think that a swiss join node can predict how much temp stack space it needs? Otherwise I'm not sure why moving the temp-stack to a per-node basis fixes the root cause. > ... > This may be true since node A calls node B calls node C, etc. However, note that the stack itself would behave in the same way in this situation (stack frame A is followed by stack frame B followed by stack frame C). > > If this nesting is somehow unbounded (potentially through async recursion) then you will get a stack overflow at some point, regardless of the size of the temp local stack, because you will eventually overflow the real stack (the C program stack) > IIUC, it is the nesting of (bounded) multiple (swiss) joins that is overflowing the temp stack - but not necessarily along with the C program stack overflowing. Maybe the code could help to explain: These temp vectors https://github.com/apache/arrow/blob/281122c018df86601ca675f3941751ddc3a89b3d/cpp/src/arrow/acero/swiss_join.cc#L2310-L2321 won't be popped from the temp stack until the enclosing function returns. And the recursion to this function could happen when the it accumulates enough rows to output: https://github.com/apache/arrow/blob/281122c018df86601ca675f3941751ddc3a89b3d/cpp/src/arrow/acero/swiss_join.cc#L2424-L2428 which executes the nested join node (synchronously, via regular function call), which again allocates temp vectors on the temp stack subsequently. Per-node (meanwhile per-thread, presumably) temp stack fixes this type of overflow by effectively being able to implicitly expand the overall temp stack size by a multiple of the number of nodes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
