[ 
https://issues.apache.org/jira/browse/IMPALA-10476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10476:
---------------------------------
    Description: 
If an executor repeatedly get disk IO failures when read/write local disk, it 
should report its unhealthy state to statestore so that we could mark the node 
as down and remove it from executor group to avoid repeated query failures in 
the cluster. This provide a mechanism for executor node to remove itself from 
scheduling.

The two main components of Impala that read / write from local disk are the 
spill-to-disk and data caching features. We need to to add stats to count local 
disk failures.

The node healthy state should be shown on the debug WebUI. We also should allow 
user to overwrite the node healthy state.

 

 

  was:
If an executor repeatedly get disk IO failures when read/write local disk, we 
should mark the node as down and remove it from executor group to avoid 
repeated query failures in the cluster.

The two main components of Impala that read / write from local disk are the 
spill-to-disk and data caching features. We need to to add stats to count local 
disk failures.

 


> Remove executor node with faulty disks from executor group
> ----------------------------------------------------------
>
>                 Key: IMPALA-10476
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10476
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> If an executor repeatedly get disk IO failures when read/write local disk, it 
> should report its unhealthy state to statestore so that we could mark the 
> node as down and remove it from executor group to avoid repeated query 
> failures in the cluster. This provide a mechanism for executor node to remove 
> itself from scheduling.
> The two main components of Impala that read / write from local disk are the 
> spill-to-disk and data caching features. We need to to add stats to count 
> local disk failures.
> The node healthy state should be shown on the debug WebUI. We also should 
> allow user to overwrite the node healthy state.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to