[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221239#comment-15221239
 ] 

Vinayakumar B commented on HDFS-9917:
-------------------------------------

Considering second part of this issue needs more discussion about getting 
heartBeatExpiryInterval at datanode side, this could be done in a follow up 
Jira.
[~brahmareddy], Please file a follow up jira for the "Avoid accumulation of 
IBRs for SNN when the standby is down for more than expected time".

Seeing the criticality of this issue, I feel it would be better to land this in 
2.7.3 with *reRegister() IBR clearance fix*.

Current changes looks good for the fix.
Please add a Test to verify the same. Mock Tests would be sufficient. 
{{TestBPOfferService.java}} contains similar tests. you can refer them.


> IBR accumulate more objects when SNN was down for sometime.
> -----------------------------------------------------------
>
>                 Key: HDFS-9917
>                 URL: https://issues.apache.org/jira/browse/HDFS-9917
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
>         Attachments: HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to