[jira] [Commented] (IMPALA-9636) Retried queries that blacklist nodes should ensure they don't run on the blacklisted node

ASF subversion and git services (Jira) Tue, 15 Sep 2020 15:04:26 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196574#comment-17196574
 ]


ASF subversion and git services commented on IMPALA-9636:
---------------------------------------------------------

Commit 40777b706b39599efb4dd8e12fafd72f35bc580c in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=40777b7 ]

IMPALA-9636: Don't run retried query on the blacklisted nodes

When a node is blacklisted, it is only placed on the blacklist for a
certain period of time. For the current implementation, it is possible
that the retried query could end up running on the node that it
blacklisted during its original attempt. To avoid same failure for
the retried query, we should not schedule query fragment instances on
the blacklisted nodes which caused the original query to fail.

This patch filters out the executors from executor group for those
nodes which are blacklisted during its original attempt when make
schedule for the retried query.
Adds new test cases test_retry_exec_rpc_failure_before_admin_delay()
and test_retry_query_failure_all_executors_blacklisted() for retried
queries which are triggered by RPC failure and blacklist timeout
are triggered by adding delay before admission.

Testing:
 - Passed test_query_retries.py, including the new test cases.
 - Passed core tests.

Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105
Reviewed-on: http://gerrit.cloudera.org:8080/16369
Reviewed-by: Sahil Takiar <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Retried queries that blacklist nodes should ensure they don't run on the 
> blacklisted node
> -----------------------------------------------------------------------------------------
>
>                 Key: IMPALA-9636
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9636
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Wenzhe Zhou
>            Priority: Critical
>             Fix For: Impala 4.0
>
>
> When a query is retried due to a node blacklisting event, there is no 
> guarantee that the retried query will *not* run on the blacklisted node. When 
> a node is blacklisted, it is only placed on the blacklist for a certain 
> period of time (the first time it is blacklisted I think it is only about 12 
> seconds). It is possible that retrying the query takes a while (perhaps the 
> query has to wait in the admission control queue again). So it is possible 
> that the retried query will end up running on the node that it blacklisted 
> during its original attempt, which is probably unwise because that node 
> caused the query to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9636) Retried queries that blacklist nodes should ensure they don't run on the blacklisted node

Reply via email to