[
https://issues.apache.org/jira/browse/IMPALA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou resolved IMPALA-9224.
---------------------------------
Fix Version/s: Impala 4.0
Target Version: Impala 4.0
Resolution: Fixed
The disk io failures for data- cache will be addressed in IMPALA-10476.
> Blacklist nodes with faulty disks
> ---------------------------------
>
> Key: IMPALA-9224
> URL: https://issues.apache.org/jira/browse/IMPALA-9224
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: Wenzhe Zhou
> Priority: Critical
> Fix For: Impala 4.0
>
>
> Similar to IMPALA-8339 and IMPALA-9137, Impala should blacklist nodes with
> faulty disks. Specifically, if a query fails because of a disk error, the
> node with that disk should be blacklisted and the query should be retried.
> We shouldn't need to blacklist nodes that fail to read from HDFS / S3, since
> they contain their own internal mechanisms for recovering from faulty disks.
> We should only blacklist nodes when failing to read / write from *local*
> disks.
> The two main components of Impala that read / write from local disk are the
> spill-to-disk and data caching features. Whenever a query fails because of a
> disk failure during spill-to-disk, the node should be blacklisted.
> Reads / writes from / to the data cache are a bit different. If a cache read
> fails due to a disk error, the error will be printed out and the Lookup()
> call to the cache will return 0 bytes read, which means it couldn't find the
> data in the cache. This should cause the scan to fall back to a normal,
> un-cached read. While this doesn't affect query correctness or the ability
> for a query to complete, it can affect performance. Since cache failures
> don't result in query failures, we might consider having a threshold of data
> cache read / writes errors before blacklisting a node.
> We need to be careful to only capture specific disk failures - e.g. disk
> quota, permission denied, etc. errors shouldn't result in blacklisting as
> they typically are a result of system misconfiguration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]