[ 
https://issues.apache.org/jira/browse/KUDU-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964440#comment-16964440
 ] 

Adar Dembo commented on KUDU-2904:
----------------------------------

bq. Should/can we persist this disk failure state on the master before 
crashing? If the master comes back up it's expected to hit disk failure on 
reading some state and crash again? Presumably we can't persist such a disk 
failure state on crashing because such write itself could fail even if there 
are multiple disks or if there is only 1 disk.

Andrew wrote a patch to persist disk failures for tservers, but eventually 
abandoned it. I can't quite remember why; I think it was deemed too complex and 
we viewed disk errors as fundamentally transient (i.e. it's common to have 
replaced the broken disk by the next time you restart Kudu).

bq. How exactly do we crash the master?  CHECK(false) ? We don't want to do a 
graceful shutdown, right?

LOG(FATAL) would probably be the ideal approach, but yeah, no graceful shutdown.


> Master shouldn't allow master tablet operations after a disk failure
> --------------------------------------------------------------------
>
>                 Key: KUDU-2904
>                 URL: https://issues.apache.org/jira/browse/KUDU-2904
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs, master
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Assignee: Bankim Bhavsar
>            Priority: Critical
>              Labels: newbie
>
> The master doesn't register any FS error handlers, which means that in the 
> event of a disk failure that doesn't intrinsically crash the server (i.e. a 
> disk failure to one of several directories), the master tablet is not failed 
> and may undergo additional MM ops. This is forbidden: the invariant is that a 
> tablet with a failed disk should itself fail. In the master perhaps the 
> behavior should be more severe (i.e. perhaps the master should crash itself).
> This surfaced with a user report of multiple minor delta compactions on a 
> master even after one of them had failed during a SyncDir() call on its 
> superblock flush. The metadata was corrupt: the blocks added to the 
> superblock by the compaction were marked as deleted in the LBM. It's unclear 
> whether the in-memory state of the superblock was corrupted by the failure 
> and subsequent compactions, or whether the corruption was caused by something 
> else. Either way, no operations should have been permitted following the 
> initial failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to