[ 
https://issues.apache.org/jira/browse/HDFS-14728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906396#comment-16906396
 ] 

Íñigo Goiri commented on HDFS-14728:
------------------------------------

A few comments:
* I think the unit tests are having some issues.
* I would not change the default value from 10 seconds to 10 minutes. Changing 
defaults is always tricky.
* In RouterRpcServer#917, it should initialize the full trace to {{throw new 
IOException(throwable);}} or similar. By the way, we should cover what this 
exception is and have tests if so.
* In the asserts, you should put the expected value first.
* It would be good to add comments through testDatanodeReportCache explaining 
what are the reasons for that sequence. Right now it is easy to see that it 
tries to verify that it does not go to the NN but this may not be as clear in a 
few months.

> RBF:GetDatanodeReport causes a large GC pressure on the NameNodes
> -----------------------------------------------------------------
>
>                 Key: HDFS-14728
>                 URL: https://issues.apache.org/jira/browse/HDFS-14728
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rbf
>            Reporter: xuzq
>            Assignee: xuzq
>            Priority: Major
>         Attachments: HDFS-14728-trunk-001.patch
>
>
> When a cluster contains millions of DNs, *GetDatanodeReport* is pretty 
> expensive, and it will cause a large GC pressure on NameNode.
> When multiple NSs share the millions DNs by federation and the router listens 
> to the NSs, the problem will be more serious.
> All the NSs will be GC at the same time.
> RBF should cache the datanode report informations and have an option to 
> disable the cache.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to