[
https://issues.apache.org/jira/browse/IMPALA-13945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009315#comment-18009315
]
ASF subversion and git services commented on IMPALA-13945:
----------------------------------------------------------
Commit 535b72e674cfc00b358682f8dea4989a1d290ca8 in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=535b72e67 ]
IMPALA-13945: Change hash trace to show each node's individual contribution
Currently, the hash trace accumulates up the plan tree and is
displayed only for tuple cache nodes. This means that tuple cache
nodes high in a large plan can have hundreds of lines of hash trace
output without an indication of which contributions came from
which nodes.
This changes the hash trace in two ways:
1. It displays each plan node's individual contribution to the hash
trace. This only contains a summary of the hash contributed by
the child, so the hash trace does not accumulate up the plan tree.
Since each node is displaying its own contribution, the tuple
cache node does not display the hash trace itself.
2. This adds structure to the hash trace to include a comment for
each contribution to the hash trace. This allows a cleaner display
of the individual pieces of a node's hash trace. It also gives
extra information about the specific contributions into the hash.
It should be possible to trace the contribution through the plan
tree.
This also changes the output to only display the hash trace with
explain_level=EXTENDED or higher (i.e. it won't be displayed with
STANDARD).
Example output:
tuple cache hash trace:
TupleDescriptor 0: TTupleDescriptor(id:0, byteSize:0, numNullBytes:0,
tableId:1, tuplePath:[])
Table: TTableName(db_name:functional, table_name:alltypes)
PlanNode:
[TPlanNode(node_id:0, node_type:HDFS_SCAN_NODE, num_children:0,
limit:-1, row_tuples:[0], nullable_tu]
[ples:[false], disable_codegen:false, pipelines:[],
hdfs_scan_node:THdfsScanNode(tuple_id:0, random_r]
[eplica:false, use_mt_scan_node:false, is_partition_key_scan:false,
file_formats:[]), resource_profil]
[e:TBackendResourceProfile(min_reservation:0, max_reservation:0))]
Query options hash: TQueryOptionsHash(hi:-2415313890045961504,
lo:-1462668909363814466)
Testing:
- Modified TupleCacheInfoTest and TupleCacheTest to use the new hash trace
Change-Id: If53eda24e7eba264bc2d2f212b63eab9dc97a74c
Reviewed-on: http://gerrit.cloudera.org:8080/23017
Reviewed-by: Yida Wu <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Change the tuple cache key trace to display an individual node's contribution
> -----------------------------------------------------------------------------
>
> Key: IMPALA-13945
> URL: https://issues.apache.org/jira/browse/IMPALA-13945
> Project: IMPALA
> Issue Type: Task
> Components: Frontend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
>
> As tuple caching is supported higher in the plan tree, the cache key trace
> can become enormous, as it includes all of its childrens' information plus
> its own contribution. An individual trace can take up over a hundred lines in
> the profile. It can be hard to determine which contribution comes from which
> child.
> We should break up the hash trace and display each node's individual
> contribution to the hash trace. This should be smaller in the profile and
> more clearer display what is new at each level.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]