[ 
https://issues.apache.org/jira/browse/HDFS-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548872#comment-13548872
 ] 

Daryn Sharp commented on HDFS-4288:
-----------------------------------

bq. This will also solve issues related to DN restart and NN may not process 
the block report.

True, but the boolean patch (simple incremental improvement on the existing 
trunk behavior) fixes both DN restart and reregistration after a broken 
connection.  The NN cannot distinguish the two.  So with a boolean, the NN 
(naively) processes the BR associated with every (re)registration.

A sequence number, that relies on a sentinel value, allows the DN to dictate 
the NN's behavior.  This works well for restart since we know we are starting 
from 0.  For a rereg, block updates may have been lost, so the sequence number 
must be guaranteed to always be reset to 0.  That's naive like the boolean, and 
might be hard or fragile to ensure it's always reset - in which case we might 
as well go with the boolean.

Better yet, the logic would be to {{(seqNum == 0 || seqNum != lastSeqNum)}}.  
However this requires writable/RPC changes on 23, and protobuf changes on 
trunk/2 and trying to ensure backwards compatibility with an optional protobuf 
field, etc.  Would you be ok if I filed another jira?


                
> NN accepts incremental BR as IBR in safemode
> --------------------------------------------
>
>                 Key: HDFS-4288
>                 URL: https://issues.apache.org/jira/browse/HDFS-4288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-4288.branch-23.patch
>
>
> If a DN is ready to send an incremental BR and the NN goes down, the DN will 
> repeatedly try to reconnect.  The NN will then process the DN's incremental 
> BR as an initial BR.  The NN now thinks the DN has only a few blocks, and 
> will ignore all subsequent BRs from that DN until out of safemode -- which it 
> may never do because of all the "missing" blocks on the affected DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to