[ 
https://issues.apache.org/jira/browse/KUDU-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414100#comment-16414100
 ] 

Todd Lipcon commented on KUDU-2372:
-----------------------------------

Per KUDU-2359 I think it may make sense to allow starting up with a bad disk so 
that we don't need manual intervention after a single disk failure (eg on a 
12-disk host)

> Don't let kudu start up if any disks are mounted read-only
> ----------------------------------------------------------
>
>                 Key: KUDU-2372
>                 URL: https://issues.apache.org/jira/browse/KUDU-2372
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Andrew Wong
>            Priority: Major
>
> Today, if a Kudu tserver runs into EROFS (read-only mount error), it treats 
> the error as it would a complete disk failure (EIO), allowing successful 
> startup of the server, but failing the tablets that are configured to use the 
> "failed" disk.
> If something is wrong with the mounting of a disk, it might be helpful to 
> bring immediate attention to it, and have operators deal with it, rather than 
> handling it automatically. As such, it might be helpful to prevent Kudu from 
> starting up if errors are detected with the mount configurations.
> There are tradeoffs here to be considered:
>  * The current behavior, as it is today, will evict and delete the data from 
> the failed tablets, as it is treated as an unrecoverable failure. The user 
> can ignore such failures and handle it at their leisure, since Kudu will 
> re-replicate the tablets lost in this way
>  * If we were to instead crash, this gives operators some immediate feedback 
> and a time limit to use `kudu fs update_dirs` to remove the read only drive, 
> or maybe fix the mountpoint itself



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to