[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6307:
------------------------------------
    Description: 
when we get empty batch, record batch sizer calculates row width as zero. In 
that case, we do not do accounting and memory allocation correctly for outgoing 
batches. 

For example, in merge join, for outer left join, if right side batch is empty, 
we still have to include the right side columns as null in outgoing batch. We 
have to allocate memory for those vectors correctly. We also have to include 
row width of those columns in the outgoing row width and number of rows 
calculation. 

Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
zero capacity.  When we read the next batch with data, we will end up going 
through realloc loop. If we use right side row width as 0 in outgoing row width 
calculation, number of rows we will calculate will be higher and later when we 
get a non empty batch, we might exceed the memory limits. 

One possible workaround/solution : Allocate memory based on std size for empty 
input batch. Use allocation width as width of the batch in number of rows 
calculation.

 

  was:
when we get empty batch, record batch sizer calculates row width as zero. In 
that case, we do not do accounting and memory allocation correctly for outgoing 
batches. 

For example, in merge join, for outer left join, if right side batch is empty, 
we still have to include the right side columns as null in outgoing batch. We 
have to allocate memory for those vectors correctly. We also have to include 
row width of those columns in the outgoing row width and number of rows 
calculation. 


> Handle empty batches in record batch sizer correctly
> ----------------------------------------------------
>
>                 Key: DRILL-6307
>                 URL: https://issues.apache.org/jira/browse/DRILL-6307
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. We have to allocate memory for those vectors correctly. We also have 
> to include row width of those columns in the outgoing row width and number of 
> rows calculation. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to