[ 
https://issues.apache.org/jira/browse/IMPALA-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404982#comment-17404982
 ] 

Riza Suminto commented on IMPALA-10883:
---------------------------------------

There seem to be different expectations between coordinator and reporter 
backend. Let say MT_DOP=12 and a backend sending runtime profile update several 
times.

Coordinator update its backend state profile through function call 
BackendState::ApplyExecStatusReport() -> 
AggregatedRuntimeProfile::UpdateAggregatedFromInstances().
UpdateAggregatedFromInstances() expects that the incoming TRuntimeProfileTree 
contains all 12 runtime profiles from 12 instances that run at the reporting 
backend.
[https://github.com/apache/impala/blob/237ed5e/be/src/util/runtime-profile.cc#L542-L544]

However, on the sender side, the backend can send a partial update. That is 
when some of the fragment instances finished earlier and already send their 
final status update.
The agg_profile is initialized with 12 runtime profiles slots. But the slot 
that belongs to the finished fragment instance will not be populated.
[https://github.com/apache/impala/blob/237ed5e/be/src/runtime/query-state.cc#L558-L575]

> Dense runtime profile missing some fragment instance profile
> ------------------------------------------------------------
>
>                 Key: IMPALA-10883
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10883
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 4.0.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>
> I ran TPC-DS Q78 in the following setup:
>  * Cluster of 20 nodes
>  * MT_DOP=12
>  * --gen_experimental_profile=true
> When I check the query profile of the completed query, I notice that couple 
> fragment are missing profile from some instances. This missing profile is 
> indicated in the missing instances id like this:
> {code:java}
> Fragment F00 [228 instances]:
>  Instances: Instance 204c719f48777536:b83eb0ed00000001 
> (host=ia0306.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000002 (host=ia0306.halxg.cloudera.com:27000), , 
> Instance 204c719f48777536:b83eb0ed00000004 
> (host=ia0306.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000005 (host=ia0306.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000006 
> (host=ia0306.halxg.cloudera.com:27000), , , Instance 
> 204c719f48777536:b83eb0ed00000009 (host=ia0306.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed0000000a 
> (host=ia0306.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed0000000b (host=ia0306.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed0000000c 
> (host=ia0306.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed0000000d (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed0000000e 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed0000000f (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000010 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000011 (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000012 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000013 (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000014 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000015 (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000016 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000017 (host=ia0318.halxg.cloudera.com:27000), 
> Instance 204c719f48777536:b83eb0ed00000018 
> (host=ia0318.halxg.cloudera.com:27000), Instance 
> 204c719f48777536:b83eb0ed00000019 (host=ia0322.halxg.cloudera.com:27000), , , 
> , Instance 204c719f48777536:b83eb0ed0000001d 
> (host=ia0322.halxg.cloudera.com:27000), , , , Instance 
> 204c719f48777536:b83eb0ed00000021 (host=ia0322.halxg.cloudera.com:27000), , , 
> , Instance 204c719f48777536:b83eb0ed00000025 
> (host=ia0317.halxg.cloudera.com:27000), , , , ...{code}
> Note several missing instances from host ia0306, ia0322, ia0317, and so on.
> I tried to tune FLAGS_status_report_interval_ms faster to every 500ms and 
> there are even more instance profile missing. On the other hand, setting 
> FLAGS_status_report_interval_ms=0 (only send final report) will result in 
> complete runtime profile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to