[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844178#comment-16844178
 ] 

Santosh Marella edited comment on HDFS-12914 at 5/20/19 6:14 PM:
-----------------------------------------------------------------

{quote} Santosh Marella how many DNs do you have?  According to the limited 
logs, I think it is caused by following case. A high cpu load of SNN delayed 
the processing of full block report.{quote}

[~starphin] - DNs are in the order of hundreds. You are right that a high cpu 
load on SNN has delayed processing a FBR from a DN that was issued a lease. The 
SNN started processing the reports, but the lease expired after it processed 3 
out of 12 reports.


was (Author: smarella):
{quote} Santosh Marella how many DNs do you have?  According to the limited 
logs, I think it is caused by following case. A high cpu load of SNN delayed 
the processing of full block report.{quote}

DNs are in the order of hundreds. You are right that a high cpu load on SNN has 
delayed processing a FBR from a DN that was issued a lease. The SNN started 
processing the reports, but the lease expired after it processed 3 out of 12 
reports.

> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
>                 Key: HDFS-12914
>                 URL: https://issues.apache.org/jira/browse/HDFS-12914
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to