[ 
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866607#comment-17866607
 ] 

ASF subversion and git services commented on IMPALA-13209:
----------------------------------------------------------

Commit a486305a922d672f77ff23b5f42e604a720597fd in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a486305a9 ]

IMPALA-13209: Optimize ConvertRowBatchTime in ExchangeNode

The patch optimizes the most common case when the src and dst
RowBatches have the same number of tuples per row.

ConvertRowBatchTime is decreased from >600ms to <100ms in a query
with busy exchange node:
set mt_dop=8;
select straight_join count(*) from tpcds_parquet.store_sales s1
  join /*+broadcast*/ tpcds_parquet.store_sales16 s2
  on s1.ss_customer_sk = s2.ss_customer_sk;

TPCDS-20 showed minor improvement (0.77%). The affect is likely to
be larger if more nodes are involved.

Testing:
- passed core tests

Change-Id: Iab94315364e8886da1ae01cf6af623812a2da9cb
Reviewed-on: http://gerrit.cloudera.org:8080/21571
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> ExchangeNode's ConvertRowBatchTime can be high
> ----------------------------------------------
>
>                 Key: IMPALA-13209
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13209
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: performance
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during 
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join 
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the 
> profile:
>            - ConvertRowBatchTime: 640.072ms
>            - InactiveTotalTime: 243.783ms
>            - PeakMemoryUsage: 12.53 MB (13142368)
>            - RowsReturned: 46.09M (46086464)
>            - RowsReturnedRate: 46.93 M/sec
>            - TotalTime: 981.968ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to