Gopal V created HIVE-20177:
------------------------------
Summary: Vectorization: Reduce KeyWrapper allocation in GroupBy
Streaming mode
Key: HIVE-20177
URL: https://issues.apache.org/jira/browse/HIVE-20177
Project: Hive
Issue Type: Bug
Components: Vectorization
Reporter: Gopal V
The streaming mode for VectorGroupBy allocates a large number of arrays due to
VectorKeyHashWrapper::duplicateTo()
Since the vectors can't be mutated in-place while a single batch is being
processed, this operation can be cut by 1000x by allocating a streaming key at
the end of the loop, instead of reallocating within the loop.
{code}
for(int i = 0; i < batch.size; ++i) {
if (!batchKeys[i].equals(streamingKey)) {
// We've encountered a new key, must save current one
// We can't forward yet, the aggregators have not been evaluated
rowsToFlush[flushMark] = currentStreamingAggregators;
if (keysToFlush[flushMark] == null) {
keysToFlush[flushMark] = (VectorHashKeyWrapper)
streamingKey.copyKey();
} else {
streamingKey.duplicateTo(keysToFlush[flushMark]);
}
currentStreamingAggregators =
streamAggregationBufferRowPool.getFromPool();
batchKeys[i].duplicateTo(streamingKey);
++flushMark;
}
{code}
The duplicateTo can be pushed out of the loop since there only one to truly
keep a copy of is the last unique key in the VRB.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)