[
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee updated HDFS-5583:
-----------------------------
Attachment: HDFS-5583.patch
The patch makes DN send OOB acks to clients who are writing. The added test
case currently doesn't do much, but after the client-side changes, it will be
updated.
The OOB Ack sending can still be verified from running the test new case. The
test log should show something like following:
{noformat}
[DataNode]
2014-02-10 23:23:52,412 INFO datanode.DataNode
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before
restart
2014-02-10 23:23:52,412 INFO datanode.DataNode
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO datanode.DataNode
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type
OOB_TYPE1
[Upstream Datanode]
2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060))
- Relaying an out of band ack of type OOB_TYPE
[Client]
2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) -
DFSOutputStream ResponseProcessor exception for block
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode
127.0.0.1:55182
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{noformat}
> Make DN send an OOB Ack on shutdown before restaring
> ----------------------------------------------------
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an
> upcoming upgrade-restart. Client should ignore the pipeline error from the
> node for a configured amount of time and try reconstruct the pipeline without
> excluding the restarted node. If the node does not come back in time,
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local.
> If the upgrade-restart is fast, the wait is preferable to losing locality.
> It could also be used in general instead of the draining-writer strategy.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)