[jira] [Created] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

Rushabh S Shah (JIRA) Thu, 13 Aug 2015 07:30:22 -0700

Rushabh S Shah created HDFS-8893:
------------------------------------

             Summary: DNs with failed volumes stop serving during rolling 
upgrade
                 Key: HDFS-8893
                 URL: https://issues.apache.org/jira/browse/HDFS-8893
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Rushabh S Shah
            Priority: Critical



When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker to 
each of their volumes. If one of the volumes is bad, this will fail. When this 
failure happens, the DN does not update the key it received from the NN.
Unfortunately we had one failed volume on all the 3 datanodes which were having 
replica.

Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
DNs with failed volumes will stop serving clients.

Here is the stack trace on the datanode size:
{noformat}
2015-08-11 07:32:28,827 [DataNode: heartbeating to <nn1>8020] WARN 
datanode.DataNode: IOException in offerService
java.io.IOException: Read-only file system
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:947)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
        at java.lang.Thread.run(Thread.java:722)

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

Reply via email to