[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802555#comment-13802555
 ] 

Vinay commented on HDFS-5014:
-----------------------------

{quote}It's possible for state to change after releasing the read lock, but 
before the if statement executes. The method would then execute logic assuming 
the old values of bpServiceToActive and lastActiveClaimTxId.{quote}
But in this case always double check is done if there are any changes in the 
current call. I dont think this will be a problem
{code:java}+        // double check of any state changes
+        if (bposThinksActive != (bpServiceToActive == actor)
+            || isMoreRecentClaim != (txid > lastActiveClaimTxId)) {
+          // don't update anything here, as another actor have updated the
+          // latest details
+          return;
         }{code}

{quote}processCommandFromActor: Even though the read lock is not held during 
processCommandFromStandby, it's still possible to have the same problem that 
you saw in your cluster, but on the active instead of the standby. If the 
active requests re-registration of datanodes, and then immediately goes into a 
bad state or a network partition prevents communication, then datanodes will be 
stuck inside the re-register polling loop while holding the read lock. This 
will prevent the other one from taking over as active, which requires holding 
the write lock.{quote}

Yes, I agree .. In extreme case this can happen. But chances of this will be 
rare when compare to current issue.

{quote}I'm starting to think that we can't fix this bug by just tuning locks in 
BPOfferService. Instead, I'm starting to think that we need to work out a way 
for the re-register polling loops to yield the lock in case of repeated 
failure, to give the other BPServicActor a chance. {quote}
I am attaching a patch for this. I hope this change alone will solve the 
current issue. But still I would like read/write locks to be in place which 
helps in allow faster processing of commands during normal cluster state.


> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5014
>                 URL: https://issues.apache.org/jira/browse/HDFS-5014
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, ha
>    Affects Versions: 3.0.0, 2.0.4-alpha
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}      try {
>         nsInfo = bpNamenode.versionRequest();
>         LOG.debug(this + " received versionRequest response: " + nsInfo);
>         break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to