Andrew Wong created KUDU-2040:
---------------------------------
Summary: coordinate data dir lifecycle with DataDirGroups
Key: KUDU-2040
URL: https://issues.apache.org/jira/browse/KUDU-2040
Project: Kudu
Issue Type: Improvement
Components: fs, tserver
Reporter: Andrew Wong
At the time of creation, a tablet's DataDirGroup will avoid using directories
that are full and directories that have failed. This can lead to the creation
of groups that are below the flag-specified target number of dirs. This isn't
necessarily a error, but if the disks do come back to a healthy state, there is
no way to resize an undersized group.
The assumption in this implementation is that these states are permanent, which
isn't necessarily the case. A full disk may have tablets removed; when disk
refreshes become supported by Kudu, disk failure will also become transient. As
such, it's worth considering if/when/how undersized DataDirGroups should be
resized.
A couple of notes on this:
- once a disk group has been created, the tablet's data will be spread across
the disks in that group, so completely changing the group will require that the
tablet's data is rewritten
- another approach might be to replicate the understriped tablet (either on the
same server or elsewhere) in hopes that more disks are available
- recovery from a disk failure not implemented at this time, so disk failure is
currently not considered transient (this will change once recovery is
implemented)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)