[
https://issues.apache.org/jira/browse/HADOOP-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661760#action_12661760
]
Raghu Angadi commented on HADOOP-4971:
--------------------------------------
> [...] formula can be simplified as below
right.
> I forgot to mention that the comment is a little bit confusing: the next
> report is actually around 11:35:43. The reports after the next will be
> xx:20:14.
There are 3 reports involved in the comment : last, current, and next. Are you
mixing "current" and "next"? The current report returned at 11:35:14 (we don't
care when it was started).
> Block report times from datanodes could converge to same time.
> -----------------------------------------------------------------
>
> Key: HADOOP-4971
> URL: https://issues.apache.org/jira/browse/HADOOP-4971
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: HADOOP-4971.patch
>
>
> Datanode block reports take quite a bit of memory to process at the namenode.
> After the inital report, DNs pick a random time to spread this load across at
> the NN. This normally works fine.
> Block reports are sent inside "offerService()" thread in DN. If for some
> reason this thread was stuck for long time (comparable to block report
> interval), and same thing happens on many DNs, all of them get back to the
> loop at the same time and start sending block report then and every hour at
> the same time.
> RPC server and clients in 0.18 can handle this situation fine. But since this
> is a memory intensive RPC it lead to large GC delays at the NN. We don't know
> yet why offerService therads seemed to be stuck, but DN should re-randomize
> it block report time in such cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.