Nathan Roberts created HDFS-5446: ------------------------------------ Summary: Consider supporting a mechanism to allow datanodes to drain outstanding work during rolling upgrade Key: HDFS-5446 URL: https://issues.apache.org/jira/browse/HDFS-5446 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.2.0 Reporter: Nathan Roberts
Rebuilding write pipelines is expensive and this can happen many times during a rolling restart of datanodes (i.e. during a rolling upgrade). It seems like it might help if datanodes could be told to drain current work while rejecting new requests - possibly with a new response indicating the node is temporarily unavailable (it's not broken, it's just going through a maintenance phase where it shouldn't accept new work). Waiting just a few seconds is normally enough to clear up a good percentage of the open requests without error, thus reducing the overhead associated with restarting lots of datanodes in rapid succession. Obviously would need a timeout to make sure the datanode doesn't wait forever. -- This message was sent by Atlassian JIRA (v6.1#6144)