GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/1107
DRILL-6123: Limit batch size for Merge Join based on memory Merge join limits output batch size to 32K rows irrespective of row size. This can create large batches (in terms of memory), depending upon average row width. Changed the logic to figure out output row count based on memory specified with the new outputBatchSize option and average outgoing row width. Average outgoing row width will be sum of left and right batch row widths. Output row count will be minimum of 1 and max of 64k. Added AbstractRecordBatchMemoryManager class to be used across all operators. Restructured the code a little bit for that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-6123 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1107 ---- commit e01da78116730afbcd9b5062e316678cebc4848f Author: Padma Penumarthy <ppenumar97@...> Date: 2018-01-31T00:58:57Z DRILL-6123: Limit batch size for Merge Join based on memory ---- ---