[jira] [Commented] (HDFS-1972) HA: Datanode fencing mechanism

Eli Collins (Commented) (JIRA) Wed, 14 Dec 2011 19:02:07 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169947#comment-13169947
 ]


Eli Collins commented on HDFS-1972:
-----------------------------------

Patch looks like a good implementation of the approach. Here's my initial 
comments. For those following along and wondering why the patch doesn't have 
the DNs ignore commands from the standby that part of the fencing was already 
done in HDFS-2627.

I'd modify the comment by blockContentsTrusted to something like "DN may have 
some pending deletions issued by a prior NN that this NN is unaware of. 
Therefore we don't perform actions based on the contents of this DN until after 
we receive a BR followed by a heartbeat confirming the DN thought we were 
active, which means this NN is now uptodate with respect to this DN". Maybe 
revert the polarity and rename blockContentsStale, since we're really tracking 
whether the block contents are up-to-date?

Update javadoc for NumberReplicas, good to define "untrusted", if a DN is 
considered untrusted then all replicas are considered unstrusted.

Not your change but in BlockManager rename "count" to "decomissioned" and 
update the javadoc.

In processMisReplicatedBlock a comment to the effect of (but better worded 
than) "countNodes counts all blocks from an unstrusted DN as untrusted (and all 
DNs start out unstrusted until their next heartbeat), however we only act on 
this mistrust if the block is over-replicated".

Commment "If we have a least one" in invalidateBlock can be moved down to the 
2nd if".

I think it's OK to assume postponedMisreplicatedBlocks is always small.. I 
suppose even if we re-commisioning a rack and immediately fail-over this should 
be sufficient.
                
> HA: Datanode fencing mechanism
> ------------------------------
>
>                 Key: HDFS-1972
>                 URL: https://issues.apache.org/jira/browse/HDFS-1972
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, ha, name-node
>            Reporter: Suresh Srinivas
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1972-v1.txt, hdfs-1972.txt
>
>
> In high availability setup, with an active and standby namenode, there is a 
> possibility of two namenodes sending commands to the datanode. The datanode 
> must honor commands from only the active namenode and reject the commands 
> from standby, to prevent corruption. This invariant must be complied with 
> during fail over and other states such as split brain. This jira addresses 
> issues related to this, design of the solution and implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1972) HA: Datanode fencing mechanism

Reply via email to