[
https://issues.apache.org/jira/browse/IMPALA-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500607#comment-16500607
]
Dan Hecht commented on IMPALA-7101:
-----------------------------------
Below are the relevant logs. I think what's happening is that the
{{Coordinator::Cancel()}} call happened after coordinator entered the
{{RETURNED_ALL_RESULTS}} state, and the coordinator is waiting for the backends
to complete on the barrier. And then {{CloseOperation()}} was called which
deregistered the query, before the {{ReportExecStatus()}} RPC. ImpalaServer
will respond back to that RPC so the backends will complete, but the barrier
won't be signaled because {{ImpalaServer::ReportExecStatus()}} won't be able to
find the coordinator since the query was already unregistered.
{code}
I0601 20:22:14.482776 1128 coordinator.cc:677] Backend completed:
host=dhecht-desktop.pa.cloudera.com:22002 remaining=3
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.482795 1128 coordinator-backend-state.cc:228]
query_id=ce42b072b8ff3a0f:282d654c00000000: first in-progress backend:
dhecht-desktop.pa.cloudera.com:22000
I0601 20:22:14.494144 28442 krpc-data-stream-mgr.cc:293] DeregisterRecvr():
fragment_instance_id=ce42b072b8ff3a0f:282d654c00000000, node=2
I0601 20:22:14.494161 28442 krpc-data-stream-recvr.cc:558] cancelled stream:
fragment_instance_id=ce42b072b8ff3a0f:282d654c00000000 node_id=2
I0601 20:22:14.494437 28425 coordinator.cc:449] ExecState: query
id=ce42b072b8ff3a0f:282d654c00000000 execution completed
I0601 20:22:14.494451 28425 coordinator.cc:579] Coordinator waiting for
backends to finish, 2 remaining. query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.495517 6181 coordinator.cc:677] Backend completed:
host=dhecht-desktop.pa.cloudera.com:22000 remaining=2
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.495529 6181 coordinator-backend-state.cc:228]
query_id=ce42b072b8ff3a0f:282d654c00000000: first in-progress backend:
dhecht-desktop.pa.cloudera.com:22001
I0601 20:22:14.496392 28442 query-state.cc:288] Cancelling fragment instances
as directed by the coordinator. Returned status: Cancelled
I0601 20:22:14.496402 28442 query-state.cc:416] Cancel:
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.496408 28442 krpc-data-stream-mgr.cc:324] cancelling all streams
for fragment_instance_id=ce42b072b8ff3a0f:282d654c00000001
I0601 20:22:14.496417 28442 krpc-data-stream-mgr.cc:324] cancelling all streams
for fragment_instance_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498097 28423 impala-server.cc:1098] Cancel():
query_id=4a4448de2a31e20b:598a77d000000000
I0601 20:22:14.498147 28423 child-query.cc:139] Cancelling and closing child
query with operation id: ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498155 28423 impala-hs2-server.cc:661] CancelOperation():
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498162 28423 impala-server.cc:1098] Cancel():
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498172 28423 impala-hs2-server.cc:683] CloseOperation():
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498178 28423 impala-server.cc:1011] UnregisterQuery():
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498181 28423 impala-server.cc:1098] Cancel():
query_id=ce42b072b8ff3a0f:282d654c00000000
I0601 20:22:14.498809 27225 impala-server.cc:1798] Connection from client
::ffff:127.0.0.1:40303 closed, closing 1 associated session(s)
I0601 20:22:14.502321 7658 impala-server.cc:1196] ReportExecStatus(): Received
report for unknown query ID (probably closed or cancelled):
ce42b072b8ff3a0f:282d654c00000000
{code}
> Builds are timing out/hanging
> -----------------------------
>
> Key: IMPALA-7101
> URL: https://issues.apache.org/jira/browse/IMPALA-7101
> Project: IMPALA
> Issue Type: Bug
> Reporter: Thomas Tauber-Marshall
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: broken-build
>
> We've seen a large number of builds in the last week or two that appear to
> have hung and gotten killed after a 24-hour timeout.
> Exactly where the hang is occurring is different in each build, but II
> suspect it has something to do with cancellation no working correctly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]