Datanode fencing mechanism

lei liu Mon, 28 Oct 2013 01:58:56 -0700

In https://issues.apache.org/jira/browse/HDFS-1972 jira, there is one below
case:
Scenario 3: DN restarts during split brain period


(this scenario illustrates why I think we need to persistently record the
promise about who is active)

   - block has 2 replicas, user asks to reduce to 1
   - NN1 adds the block to DN1's invalidation queue, but it's backed up
   behind a bunch of other commands, so doesn't get issued yet.
   - Failover occurs, but NN1 still thinks it's active.
   - DN1 promises to NN2 not to accept commands from NN1. It sends an empty
   deletion report to NN2. Then, it crashes.
   - NN2 has received a deletion report from everyone, and asks DN2 to
   delete the block. It hasn't realized that DN1 is crashed yet.
   - DN2 deletes the block.


   - DN1 starts back up. When it comes back up, it talks to NN1 first
   (maybe it takes a while to connect to NN2 for some reason)
      - ** Now, if we had saved the "promise" as part of persistent state,
      we could ignore NN1 and avoid this issue. Otherwise:
      - NN1 still thinks it's active, and sends a command to DN1 to delete
      the block. DN1 does so.
      - We lost the bloc


I am use the CDH4.3.1 version, and am reading the DataNode code. I don't
find the DataNode to save the "promise" as part of persistent state.   I
want to know whether the case 3 is handled in CDH4.3.1 version.  If  the
case is hadnled, where is the code?


Thanks,

LiuLe

Datanode fencing mechanism

Reply via email to