[
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192694#comment-15192694
]
Vinayakumar B commented on HDFS-9917:
-------------------------------------
bq. Clear the IBRS on re-register to namenode.
I think this is fine. This is only one part of the solution to make SNN start
successfully.
Also its required to limit the number of IBRs for Standby.
1. May be IBRs for StandbyNN can have a threshold ( say 100K or 1Million IBRs
).
2. Also not to loose any important IBRs, IBRs can be cleared when "the
threshold is reached AND 'lastIBR' is more than 'heartbeatExpiryInterval'. i.e.
DataNode is considered dead in Namenode side".
[~szetszwo]/[~jingzhao], does this make sense to you?
> IBR accumulate more objects when SNN was down for sometime.
> -----------------------------------------------------------
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
>
> SNN was down for sometime because of some reasons..After restarting SNN,it
> became unreponsive because
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs),
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue.
> To recover this( to clear this objects) ,restarted all the DN's one by
> one..This issue happened in 2.4.1 where split of blockreport was not
> available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)