[
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783929#comment-16783929
]
ASF subversion and git services commented on IMPALA-8274:
---------------------------------------------------------
Commit 110b362a52eda053caa0d177016e22a23a1a9612 in impala's branch
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=110b362 ]
IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()
The coordinator skips over any stale or duplicated status
reports of fragment instances. In the previous implementation,
the index pointing into the vector of Thrift profiles wasn't
updated when skipping over a status report. This breaks the
assumption that the status reports and thrift profiles vectors
have one-to-one correspondence. Consequently, we may end up
passing the wrong profile to InstanceStats::Update(), leading
to random crashes.
This change fixes the problem above by using iterators to
iterate through the status reports and thrift profiles vectors
and ensures that both iterators are updated on every iteration
of the loop.
Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Reviewed-on: http://gerrit.cloudera.org:8080/12651
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Missing update to index into profiles vector in
> Coordinator::BackendState::ApplyExecStatusReport()
> --------------------------------------------------------------------------------------------------
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Blocker
> Labels: crash
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update
> of a fragment instance. As a result, we may end up passing the wrong profile
> to {{instance_stats->Update()}}. This may lead to random crashes in
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
> int idx = 0;
> const bool has_profile = thrift_profiles.profile_trees.size() > 0;
> TRuntimeProfileTree empty_profile;
> for (const FragmentInstanceExecStatusPB& instance_exec_status :
> backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx =
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
> VLOG_QUERY << Substitute("Ignoring stale update for query instance $0
> with "
> "seq no $1", PrintId(instance_stats->exec_params_.instance_id),
> report_seq_no);
> continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
> scan_range_progress);
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]