For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
- Adam On 3/24/11 10:30 AM, Adam Phelps wrote:
We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process: 2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing... 2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk. It gets that permission error because we have the mount directory set to be immutable: root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/ ------------------- /var/lib/stats/hdfs/2 ----i------------e- /var/lib/stats/hdfs/4 ------------------- /var/lib/stats/hdfs/3 ------------------- /var/lib/stats/hdfs/1 As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted. HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case. Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation? (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution) Thanks - Adam