GitHub user ppadma opened a pull request:
https://github.com/apache/drill/pull/1228
DRILL-6307: Handle empty batches in record batch sizer correctly
When we get empty batch, record batch sizer calculates row width as zero.
In that case, we do not do accounting and memory allocation correctly for
outgoing batches.
For ex., for outer left join, if right side batch is empty, we still have
to include the right side columns as null in outgoing batch. Say first batch is
empty. Then, for outgoing, we allocate empty vectors with zero capacity. When
we read the next batch with data, we will end up going through realloc loop as
we write values. Also, if we use right side row width as 0 in outgoing row
width calculation, number of rows (to include in the outgoing batch) we will
calculate will be higher and later when we get a non empty batch, we might
exceed the memory limits.
This PR tries to address these problems by allocating memory based on std
size for empty input batch. Uses allocation width as width of the batch in
number of rows calculation for binary operators. For unary operators, this is
not a problem since we drop empty batches without doing any processing.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ppadma/drill DRILL-6307
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1228.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1228
----
commit cd78209e9f75a59edc68df3e416f3936fb00f917
Author: Padma Penumarthy <ppenumar97@...>
Date: 2018-04-06T19:56:06Z
DRILL-6307: Handle empty batches in record batch sizer correctly
----
---