[
https://issues.apache.org/jira/browse/HADOOP-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-4971:
---------------------------------
Attachment: HADOOP-4971.patch
Thanks Nicholas. Updated patch uses the simpler equation. The comment is
slightly modified.
> Block report times from datanodes could converge to same time.
> -----------------------------------------------------------------
>
> Key: HADOOP-4971
> URL: https://issues.apache.org/jira/browse/HADOOP-4971
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: HADOOP-4971.patch, HADOOP-4971.patch
>
>
> Datanode block reports take quite a bit of memory to process at the namenode.
> After the inital report, DNs pick a random time to spread this load across at
> the NN. This normally works fine.
> Block reports are sent inside "offerService()" thread in DN. If for some
> reason this thread was stuck for long time (comparable to block report
> interval), and same thing happens on many DNs, all of them get back to the
> loop at the same time and start sending block report then and every hour at
> the same time.
> RPC server and clients in 0.18 can handle this situation fine. But since this
> is a memory intensive RPC it lead to large GC delays at the NN. We don't know
> yet why offerService therads seemed to be stuck, but DN should re-randomize
> it block report time in such cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.