Padma Penumarthy created DRILL-6161:

             Summary: Allocate memory for outgoing vectors based on sizing 
                 Key: DRILL-6161
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Flow
    Affects Versions: 1.12.0
            Reporter: Padma Penumarthy
            Assignee: Padma Penumarthy
             Fix For: 1.13.0

Currently, in drill, we allocate memory for outgoing value vectors either for 
max value of 64k entries or start from 4096 and keep doubling as we need more 
memory. Every time we double, we allocate a new vector and do a copy. We also 
zero fill the new half. This has performance penalty. As part of batch sizing 
project, based on incoming batch(es) sizing information, we are limiting number 
of rows in outgoing batch based on memory. Since we know the number of rows and 
the average size of each column in the outgoing batch, we should use that 
information to preallocate memory for the outgoing vectors. This will be done 
as each operator is being changed to adhere to produce configured batch sizes.

Another improvement that can be done is packing the value vectors as dense as 
possible to improve the over all memory utilization. Since we allocate memory 
in powers of 2, once we figure out the number of rows to include in the 
outgoing batch, round it down to closest power of 2 and allocate memory for 
that many rows.


This message was sent by Atlassian JIRA

Reply via email to