[
https://issues.apache.org/jira/browse/HDFS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096099#comment-13096099
]
Eli Collins commented on HDFS-1161:
-----------------------------------
That's reasonable. The list of volumes (local dirs) is explicitly listed so the
config isn't portable even when specified as a percent, but it's one less
config that isn't portable.
IIRC Koji's perspective was that an admin doesn't want to specify the count or
percent of valid volumes, but that after a set number of failures the host
should be considered faulty. Eg if it's lost two disks there's probably
something wrong whether the host has 6 or 12 disks, ie assumes disk failures
w/in a host are correlated.
Ideally I think we should collect data (eg an X core host can still function
well with Y% disks) and not require users configure this at all - it would be
enabled by default and the daemons would take themselves offline when they've
determined they don't have sufficient resources.
> Make DN minimum valid volumes configurable
> ------------------------------------------
>
> Key: HDFS-1161
> URL: https://issues.apache.org/jira/browse/HDFS-1161
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Affects Versions: 0.21.0, 0.22.0
> Reporter: Eli Collins
> Assignee: Eli Collins
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: HDFS-1161-y20.patch, hdfs-1161-1.patch,
> hdfs-1161-2.patch, hdfs-1161-3.patch, hdfs-1161-4.patch, hdfs-1161-5.patch,
> hdfs-1161-6.patch
>
>
> The minimum number of non-faulty volumes to keep the DN active is hard-coded
> to 1. It would be useful to allow users to configure this value so the DN
> can be taken offline when eg half of its disks fail, otherwise it doesn't get
> reported until it's down to it's final disk and suffering degraded
> performance.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira