maxburke opened a new pull request, #23032:
URL: https://github.com/apache/datafusion/pull/23032

   ## Which issue does this PR close?
   
   Closes issue #23031 
   
   ## Rationale for this change
   
   We run into two problems when operating on datasets with approximately 60 
million rows:
   1. First, we get OOM killed on machines with 64gb or less of memory
   2. Second, on machines with more than 64gb, we overflow string array offsets 
during the record batch concatenation in the core of the join.
   
   ## What changes are included in this PR?
   
   This removes record batch concatenation from several joins (hash join, 
nested loop join, piecewise merge join)
   
   ## Are these changes tested?
   
   Yes
   
   ## Are there any user-facing changes?
   
   I sure hope not! (no)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to