[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428892#comment-16428892
 ] 

Andrew Wong commented on KUDU-2359:
-----------------------------------

Based on this, it probably makes sense to go about treating missing directories 
as "failed" directories (i.e. it should be marked "failed" in memory and all 
tablets configured to use it should be failed an re-replicated automatically). 
What does this mean for the `kudu fs update_dirs` tool, which mends missing 
directories? Its use would fall more on the side of fixing provisioning errors, 
rather than disk errors, and so it will be useful to keep around. That said, 
it'll take some thought on how to accommodate both missing directories as a 
"failed" state and missing directories as an expected state when running the 
tool.

> tserver should allow starting with a small number of missing data dirs
> ----------------------------------------------------------------------
>
>                 Key: KUDU-2359
>                 URL: https://issues.apache.org/jira/browse/KUDU-2359
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs, tserver
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to