[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

ASF GitHub Bot (JIRA) Sun, 22 Apr 2018 19:30:49 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447472#comment-16447472
 ]


ASF GitHub Bot commented on DRILL-6307:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r183264235
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -50,7 +50,7 @@
     public class RecordBatchSizer {
       private static final int OFFSET_VECTOR_WIDTH = UInt4Vector.VALUE_WIDTH;
       private static final int BIT_VECTOR_WIDTH = UInt1Vector.VALUE_WIDTH;
    -  private static final int STD_REPETITION_FACTOR = 10;
    +  public static final int STD_REPETITION_FACTOR = 10;
    --- End diff --
    
    This is another of those silly fudge factors that really have no meaning. 
The value of 10 came from the vector allocation code in `AllocationHelper` (or 
I thought it did, the magic number there is 5.)
    
    Maybe move this to `AllocationHelper` and set it to 5, then use it here and 
in `AllocationHelper` so we use a consistent guess everywhere.


> Handle empty batches in record batch sizer correctly
> ----------------------------------------------------
>
>                 Key: DRILL-6307
>                 URL: https://issues.apache.org/jira/browse/DRILL-6307
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

Reply via email to