[ 
https://issues.apache.org/jira/browse/IMPALA-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved IMPALA-6461.
-------------------------------------
    Resolution: Fixed

IMPALA-6461 : Micro-optimizations to DataStreamSender::Send
While analyzing performance of partition exchange operator, I noticed that 
there is dependency and a function call per row in the hot path. Optimizations 
in this change are: 1) Remove the data dependency between computing the hash 
and the channel 2) Inline DataStreamSender::Channel::AddRow 3) Save 
partition_exprs_.size() to save a couple of instructions This translates to 
improving CPI for DataStreamSender::Send by 10% Change-Id: 
I642a9dad531a29d4838a3537ab0e04320a69960d Reviewed-on: 
[http://gerrit.cloudera.org:8080/9221]Reviewed-by: Mostafa Mokhtar 
<mmokh...@cloudera.com> Tested-by: Impala Public Jenkins

> *DataStreamSender::Channel::AddRow needs some micro-optimizations to remove 
> per row function call and data dependency 
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6461
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6461
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Mostafa Mokhtar
>            Assignee: Mostafa Mokhtar
>            Priority: Minor
>
> While analyzing performance of partition exchange operator I noticed that 
> there is dependency and a function call per row in the hot path.
> {code}
> // hash-partition batch's rows across channels
>  // TODO: encapsulate this in an Expr as we've done for Kudu above and remove 
> this case
>  // once we have codegen here.
>  int num_channels = channels_.size();
>  for (int i = 0; i < batch->num_rows(); ++i) {
>  TupleRow* row = batch->GetRow(i);
>  uint64_t hash_val = EXCHANGE_HASH_SEED;
>  for (int j = 0; j < partition_exprs_.size(); ++j) {
>  ScalarExprEvaluator* eval = partition_expr_evals_[j];
>  void* partition_val = eval->GetValue(row);
>  // We can't use the crc hash function here because it does not result in
>  // uncorrelated hashes with different seeds. Instead we use FastHash.
>  // TODO: fix crc hash/GetHashValue()
>  DCHECK(&(eval->root()) == partition_exprs_[j]);
>  hash_val = RawValue::GetHashValueFastHash(
>  partition_val, partition_exprs_[j]->type(), hash_val);
>  }
>  RETURN_IF_ERROR(channels_[hash_val % num_channels]->AddRow(row));
>  }
> {code}
> Force inlining DataStreamSender::Channel::AddRow and breaking up the loop 
> improves partition exchange performance by 5%
> Code-generation of the hash computation IMPALA-5168 should give another 10% 
> speedup. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to