[
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866607#comment-17866607
]
ASF subversion and git services commented on IMPALA-13209:
----------------------------------------------------------
Commit a486305a922d672f77ff23b5f42e604a720597fd in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a486305a9 ]
IMPALA-13209: Optimize ConvertRowBatchTime in ExchangeNode
The patch optimizes the most common case when the src and dst
RowBatches have the same number of tuples per row.
ConvertRowBatchTime is decreased from >600ms to <100ms in a query
with busy exchange node:
set mt_dop=8;
select straight_join count(*) from tpcds_parquet.store_sales s1
join /*+broadcast*/ tpcds_parquet.store_sales16 s2
on s1.ss_customer_sk = s2.ss_customer_sk;
TPCDS-20 showed minor improvement (0.77%). The affect is likely to
be larger if more nodes are involved.
Testing:
- passed core tests
Change-Id: Iab94315364e8886da1ae01cf6af623812a2da9cb
Reviewed-on: http://gerrit.cloudera.org:8080/21571
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> ExchangeNode's ConvertRowBatchTime can be high
> ----------------------------------------------
>
> Key: IMPALA-13209
> URL: https://issues.apache.org/jira/browse/IMPALA-13209
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Major
> Labels: performance
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk =
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the
> profile:
> - ConvertRowBatchTime: 640.072ms
> - InactiveTotalTime: 243.783ms
> - PeakMemoryUsage: 12.53 MB (13142368)
> - RowsReturned: 46.09M (46086464)
> - RowsReturnedRate: 46.93 M/sec
> - TotalTime: 981.968ms
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]