Padma Penumarthy created DRILL-6161:
---------------------------------------
Summary: Allocate memory for outgoing vectors based on sizing
calculations
Key: DRILL-6161
URL: https://issues.apache.org/jira/browse/DRILL-6161
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Flow
Affects Versions: 1.12.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
Fix For: 1.13.0
Currently, in drill, we allocate memory for outgoing value vectors either for
max value of 64k entries or start from 4096 and keep doubling as we need more
memory. Every time we double, we allocate a new vector and do a copy. We also
zero fill the new half. This has performance penalty. As part of batch sizing
project, based on incoming batch(es) sizing information, we are limiting number
of rows in outgoing batch based on memory. Since we know the number of rows and
the average size of each column in the outgoing batch, we should use that
information to preallocate memory for the outgoing vectors. This will be done
as each operator is being changed to adhere to produce configured batch sizes.
Another improvement that can be done is packing the value vectors as dense as
possible to improve the over all memory utilization. Since we allocate memory
in powers of 2, once we figure out the number of rows to include in the
outgoing batch, round it down to closest power of 2 and allocate memory for
that many rows.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)