[ https://issues.apache.org/jira/browse/KUDU-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414100#comment-16414100 ]
Todd Lipcon commented on KUDU-2372: ----------------------------------- Per KUDU-2359 I think it may make sense to allow starting up with a bad disk so that we don't need manual intervention after a single disk failure (eg on a 12-disk host) > Don't let kudu start up if any disks are mounted read-only > ---------------------------------------------------------- > > Key: KUDU-2372 > URL: https://issues.apache.org/jira/browse/KUDU-2372 > Project: Kudu > Issue Type: Improvement > Components: fs > Reporter: Andrew Wong > Priority: Major > > Today, if a Kudu tserver runs into EROFS (read-only mount error), it treats > the error as it would a complete disk failure (EIO), allowing successful > startup of the server, but failing the tablets that are configured to use the > "failed" disk. > If something is wrong with the mounting of a disk, it might be helpful to > bring immediate attention to it, and have operators deal with it, rather than > handling it automatically. As such, it might be helpful to prevent Kudu from > starting up if errors are detected with the mount configurations. > There are tradeoffs here to be considered: > * The current behavior, as it is today, will evict and delete the data from > the failed tablets, as it is treated as an unrecoverable failure. The user > can ignore such failures and handle it at their leisure, since Kudu will > re-replicate the tablets lost in this way > * If we were to instead crash, this gives operators some immediate feedback > and a time limit to use `kudu fs update_dirs` to remove the read only drive, > or maybe fix the mountpoint itself -- This message was sent by Atlassian JIRA (v7.6.3#76005)