sodonnel commented on PR #5460:
URL: https://github.com/apache/hadoop/pull/5460#issuecomment-1462908624

   > @sodonnel also I am curious, was it just a specific log (only one case 
e.g. the lease was expired) or combination of logs from 
`checkLease(DatanodeDescriptor dn, long monotonicNowMs, long id)` that you have 
seen in various issues?
   > 
   > I wonder if `lease expiry` or `invalid lease` are worth having some 
dedicated metrics in `NameNodeActivity` (maybe not as with this patch, the 
subsequent attempt by BP actor should anyways have new lease id acquired from 
the response of heartbeat API before it reattempts sending FBR).
   
   In the examples I saw, its was expired leases that caused the problem. 
However the namenode was under significant pressure when it happened. In one 
example, it was actually the SBNN which was rejecting the reports. Tailing the 
edits was taking frequent long locks (over 300 seconds at time) which was 
beyond the lease expiry.
   
   In another example, it was the ANN after startup. I am not sure, but I think 
the system perhaps out of safemode with many block reports still outstanding, 
and then between under replication and IBRs, contention on the NN lock seemed 
to block the FBRs until the lease expired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to