Andrew Wong created KUDU-2135:
---------------------------------

             Summary: Persist disk health state to disk
                 Key: KUDU-2135
                 URL: https://issues.apache.org/jira/browse/KUDU-2135
             Project: Kudu
          Issue Type: Improvement
          Components: fs
            Reporter: Andrew Wong


When a tablet server disk fails, it is marked FAILED in memory and not touched 
during the lifetime of the tablet server. The next time the tablet server is 
started, however, if the disk happens to start up successfully, it will be used 
as is.

This may be risky, as the disk may be corrupted, or may be more prone to 
runtime failures. Additionally, when we begin striping metadata or WALs, we may 
end up with multiple of them for a single tablet (e.g. one that was on the 
failed disk, and another if the tablet was reassigned to the same tablet 
server). As such, the contents of the previously failed disk should not be 
used. It is thus necessary to persist the health of a disk to ensure unhealthy 
disks are not used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to