Block report times from datanodes could converge to same time.   
-----------------------------------------------------------------

                 Key: HADOOP-4971
                 URL: https://issues.apache.org/jira/browse/HADOOP-4971
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.0
            Reporter: Raghu Angadi
            Assignee: Raghu Angadi
            Priority: Blocker
             Fix For: 0.18.3



Datanode block reports take quite a bit of memory to process at the namenode. 
After the inital report, DNs pick a random time to spread this load across at 
the NN. This normally works fine. 

Block reports are sent inside "offerService()" thread in DN. If for some reason 
this thread was stuck for long time (comparable to block report interval), and 
same thing happens on many DNs, all of them get back to the loop at the same 
time and start sending block report then and every hour at the same time. 

RPC server and clients in 0.18 can handle this situation fine. But since this 
is a memory intensive RPC it lead to large GC delays at the NN. We don't know 
yet why offerService therads seemed to be stuck, but DN should re-randomize it 
block report time in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to