[
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802555#comment-13802555
]
Vinay commented on HDFS-5014:
-----------------------------
{quote}It's possible for state to change after releasing the read lock, but
before the if statement executes. The method would then execute logic assuming
the old values of bpServiceToActive and lastActiveClaimTxId.{quote}
But in this case always double check is done if there are any changes in the
current call. I dont think this will be a problem
{code:java}+ // double check of any state changes
+ if (bposThinksActive != (bpServiceToActive == actor)
+ || isMoreRecentClaim != (txid > lastActiveClaimTxId)) {
+ // don't update anything here, as another actor have updated the
+ // latest details
+ return;
}{code}
{quote}processCommandFromActor: Even though the read lock is not held during
processCommandFromStandby, it's still possible to have the same problem that
you saw in your cluster, but on the active instead of the standby. If the
active requests re-registration of datanodes, and then immediately goes into a
bad state or a network partition prevents communication, then datanodes will be
stuck inside the re-register polling loop while holding the read lock. This
will prevent the other one from taking over as active, which requires holding
the write lock.{quote}
Yes, I agree .. In extreme case this can happen. But chances of this will be
rare when compare to current issue.
{quote}I'm starting to think that we can't fix this bug by just tuning locks in
BPOfferService. Instead, I'm starting to think that we need to work out a way
for the re-register polling loops to yield the lock in case of repeated
failure, to give the other BPServicActor a chance. {quote}
I am attaching a patch for this. I hope this change alone will solve the
current issue. But still I would like read/write locks to be in place which
helps in allow faster processing of commands during normal cluster state.
> BPOfferService#processCommandFromActor() synchronization on namenode RPC call
> delays IBR to Active NN, if Stanby NN is unstable
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, ha
> Affects Versions: 3.0.0, 2.0.4-alpha
> Reporter: Vinay
> Assignee: Vinay
> Attachments: HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch,
> HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER
> command for SNN.
> 4. Datanodes started retrying to connect to SNN to register at the following
> code in BPServiceActor#retrieveNamespaceInfo() which will be called under
> synchronization.
> {code} try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely
> synchronized which is not required.
--
This message was sent by Atlassian JIRA
(v6.1#6144)