[
https://issues.apache.org/jira/browse/IMPALA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217996#comment-17217996
]
Wenzhe Zhou commented on IMPALA-10270:
--------------------------------------
Query failures could be caused by file/disk sector corruption, RPC timeout,
network partition, over resource limits like memory, file handler, thread pool,
etc. Some of causes are temporally, and could be relieved soon. We should
classify the errors to set different timeout values for blacklisting.
> Failed Task Blacklisting
> ------------------------
>
> Key: IMPALA-10270
> URL: https://issues.apache.org/jira/browse/IMPALA-10270
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> * If node “a” has a higher rate of task failures compared to the rest of the
> cluster, then node “a” should be blacklisted
> * There should only be a specific set of failures that count against a node
> - e.g. query specific failures like reading corrupted files, or mem limit
> exceeded should not count
> * This is similar to how Spark Executor Blacklisting works
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]