Padma Penumarthy commented on DRILL-6161:



> Allocate memory for outgoing vectors based on sizing calculations
> -----------------------------------------------------------------
>                 Key: DRILL-6161
>                 URL: https://issues.apache.org/jira/browse/DRILL-6161
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.13.0
> Currently, in drill, we allocate memory for outgoing value vectors either for 
> max value of 64k entries or start from 4096 and keep doubling as we need more 
> memory. Every time we double, we allocate a new vector and do a copy. We also 
> zero fill the new half. This has performance penalty. As part of batch sizing 
> project, based on incoming batch(es) sizing information, we are limiting 
> number of rows in outgoing batch based on memory. Since we know the number of 
> rows and the average size of each column in the outgoing batch, we should use 
> that information to preallocate memory for the outgoing vectors. This will be 
> done as each operator is being changed to adhere to produce configured batch 
> sizes.
> Another improvement that can be done is packing the value vectors as dense as 
> possible to improve the over all memory utilization. Since we allocate memory 
> in powers of 2, once we figure out the number of rows to include in the 
> outgoing batch, round it down to closest power of 2 and allocate memory for 
> that many rows.

This message was sent by Atlassian JIRA

Reply via email to