[
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455752#comment-16455752
]
ASF GitHub Bot commented on DRILL-6307:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1228#discussion_r184590500
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
private int netRowWidth;
private int netRowWidthCap50;
+
+ /**
+ * actual row size if input is not empty. Otherwise, standard size.
+ */
+ private int rowAllocSize;
--- End diff --
@ppadma, not much more to add. If the code requires you do estimates based
on no information, we won't get very good estimates. But, if we know we have 0
rows, then that itself is a good estimate of the size we'll need.
If there is a way to improve the estimate, I'm guessing you'll find it as
work proceeds and you seem more examples and test cases.
> Handle empty batches in record batch sizer correctly
> ----------------------------------------------------
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.13.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In
> that case, we do not do accounting and memory allocation correctly for
> outgoing batches.
> For example, in merge join, for outer left join, if right side batch is
> empty, we still have to include the right side columns as null in outgoing
> batch.
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with
> zero capacity. When we read the next batch with data, we will end up going
> through realloc loop. If we use right side row width as 0 in outgoing row
> width calculation, number of rows we will calculate will be higher and later
> when we get a non empty batch, we might exceed the memory limits.
> One possible workaround/solution : Allocate memory based on std size for
> empty input batch. Use allocation width as width of the batch in number of
> rows calculation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)