[
https://issues.apache.org/jira/browse/KUDU-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy updated KUDU-616:
----------------------------
Parent: KUDU-423
> Mitigate tablet damage when disks are lost
> ------------------------------------------
>
> Key: KUDU-616
> URL: https://issues.apache.org/jira/browse/KUDU-616
> Project: Kudu
> Issue Type: Sub-task
> Components: fs
> Affects Versions: M5
> Reporter: Adar Dembo
> Assignee: Adar Dembo
>
> Disk loss is an unfortunate fact of life, and Kudu should provide mechanisms
> for mitigating disk loss.
> # Make it possible to isolate specific tablets to some subset of the
> machine's disks, so that if one disk dies it doesn't take out all the tablets
> with it. This is more complicated than it looks:
> ** We need a concrete way of describing disk groups. It can be per-node, or
> abstract enough that it makes sense across the entire cluster, or perhaps we
> aggregate information (e.g. ten machines have 5 disks and the other forty
> machines have 6 disks).
> ** This mechanism needs to be used for both data blocks and other bits of
> metadata (master blocks, superblocks, and other random files).
> ** Presumably it needs to be provided when a table is created (or a tablet is
> split), and it needs to be persisted as part of tablet metadata. It might be
> sufficient to express it in Kudu configuration (i.e. complex gflags) but
> since it can be associated to tablet metadata, it's hard to see how this
> would work.
> # When a disk fails, the server needs to handle it appropriately (mark it as
> failed, put affected tablets in a failed state, etc.).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)