[
https://issues.apache.org/jira/browse/IMPALA-12027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088201#comment-18088201
]
ASF subversion and git services commented on IMPALA-12027:
----------------------------------------------------------
Commit 670872bc8b868ef6a6f08d524603f99c16cfb061 in impala's branch
refs/heads/master from Surya Hebbar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=670872bc8 ]
IMPALA-12027: Support additional details for DataSink nodes in ExecSummary
In the 'Exec Summary' table, we show table names, joins and other details
for different types of `PlanNode`s with the help of `label_detail` field
in the thrift representation(i.e. `TPlanNode`).
This field was not available for any type of `DataSink` nodes.
With this change, we support displaying table names and other details
for table sink nodes and other such `DataSink` nodes by introducing
the `label_detail` field into `TDataSink`.
This information is displayed in the last column of the ExecSummary,
similar to how we show the table names for scan nodes.
Operator #Hosts #Inst Avg Time Max Time ... ... Detail
-----------------------------------------------------------
F00:HDFS WRITE 1 1 13.122us 13.122us ... ... tpcds.write_table
83:AGGREGATE 1 1 164.615us 164.615us ... ...
32:SCAN HDFS 1 1 45.919us 45.919us ... ... tpcds.scan_table
...
...
With the same approach, additional details can be displayed by extending
this to other types of data sink nodes such as JoinBuildSink nodes.
Testing:
- Added new tests in tests/query_test/test_observability.py for
table details in ExecSummary
Change-Id: I2652dd896f72c5c6bbe7e76facdede2a237808d5
Reviewed-on: http://gerrit.cloudera.org:8080/23889
Reviewed-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Csaba Ringhofer <[email protected]>
> Display table names of TableSink nodes in ExecSummary
> -----------------------------------------------------
>
> Key: IMPALA-12027
> URL: https://issues.apache.org/jira/browse/IMPALA-12027
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Reporter: Quanlong Huang
> Assignee: Surya Hebbar
> Priority: Major
> Labels: ramp-up
>
> In the last column of the ExecSummary, we show the table names for ScanNodes,
> etc:
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows
> Peak Mem Est. Peak Mem Detail
> ------------------------------------------------------------------------------------------------------------------------------------
> F01:HDFS WRITER 6 6 28m2s 29m23s
> 297.39 MB 1.00 GB
> 02:SORT 6 6 35m 39m7s 3.93B 688.79M
> 79.29 GB 14.97 GB
> 01:EXCHANGE 6 6 23s918ms 26s258ms 4.30B 688.79M
> 14.56 MB 10.84 MB HASH(cs_sold_date_sk)
> F00:EXCHANGE SENDER 6 6 7m42s 8m
> 173.25 KB 0
> 00:SCAN HDFS 6 6 9s452ms 14s084ms 4.30B 688.79M
> 991.19 MB 3.44 GB tpcds_3000_text.catalog_sales{noformat}
> The above example comes from an INSERT query. It'd be useful to also show the
> table names for TableSink operators. We just need overrides on
> PlanNode#getDisplayLabelDetail():
> [https://github.com/apache/impala/blob/35fe1f37f5656e615466504129c7550089f1773d/fe/src/main/java/org/apache/impala/planner/PlanNode.java#L292]
> [https://github.com/apache/impala/blob/35fe1f37f5656e615466504129c7550089f1773d/fe/src/main/java/org/apache/impala/planner/ScanNode.java#L305]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]