korowa commented on issue #14238:
URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2611636777

   I'd suggest to rename "splitting" part of the problem to "restricting" -- if 
join is able to produce a batch that needs to be splitted (event if this batch 
exists only internally), than it already may be issue, which may hurt on some 
specific cases. I also think that `BatchSplitter` in it's current 
implementation (when it already has a batch to split) is not solving the 
problem, but just covers it (in addition if these batches for splitting are 
large enough, to start causing memory issues, `BatchSplitter` doesn't seem to 
be able to help).
   
   In this case (for splitting / restricting), I think, what @berkaysynnada 
suggests:
   
   > to make all join operators capable of performing both coalescing and 
splitting in a built-in manner
   
   is a better fit -- each join operator should be able to limit / restrict its 
internally created record batches to prevent excessive accumulation of data in 
memory (or at least, if it's required, to track them via memory reservations).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to