[
https://issues.apache.org/jira/browse/HIVE-20177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546056#comment-16546056
]
Hive QA commented on HIVE-20177:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931841/HIVE-20177.01.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:green}SUCCESS:{color} +1 due to 14661 tests passed
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/12649/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12649/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12649/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12931841 - PreCommit-HIVE-Build
> Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode
> ---------------------------------------------------------------------
>
> Key: HIVE-20177
> URL: https://issues.apache.org/jira/browse/HIVE-20177
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Major
> Attachments: HIVE-20177.01.patch, HIVE-20177.WIP.patch
>
>
> The streaming mode for VectorGroupBy allocates a large number of arrays due
> to VectorKeyHashWrapper::duplicateTo()
> Since the vectors can't be mutated in-place while a single batch is being
> processed, this operation can be cut by 1000x by allocating a streaming key
> at the end of the loop, instead of reallocating within the loop.
> {code}
> for(int i = 0; i < batch.size; ++i) {
> if (!batchKeys[i].equals(streamingKey)) {
> // We've encountered a new key, must save current one
> // We can't forward yet, the aggregators have not been evaluated
> rowsToFlush[flushMark] = currentStreamingAggregators;
> if (keysToFlush[flushMark] == null) {
> keysToFlush[flushMark] = (VectorHashKeyWrapper)
> streamingKey.copyKey();
> } else {
> streamingKey.duplicateTo(keysToFlush[flushMark]);
> }
> currentStreamingAggregators =
> streamAggregationBufferRowPool.getFromPool();
> batchKeys[i].duplicateTo(streamingKey);
> ++flushMark;
> }
> {code}
> The duplicateTo can be pushed out of the loop since there only one to truly
> keep a copy of is the last unique key in the VRB.
> The actual byte[] values within the keys are safely copied out by -
> VectorHashKeyWrapperBatch.assignRowColumn() which calls setVal() and not
> setRef().
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)