[ 
https://issues.apache.org/jira/browse/HDFS-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887878#comment-16887878
 ] 

Chen Zhang commented on HDFS-14576:
-----------------------------------

Thanks [~hexiaoqiao] for your detailed description of your solution and 
insights. 
{quote}1. Optimizing safemode leave mechanism, HDFS-14559, and 
[^HDFS-14186.001.patch],
{quote}
I have go through all discussion under the JIRA HDFS-14186, your point is that: 
after NameNode leave SafeMode it still need to process very large amount of 
FBR, this cause the load of NameNode very high, which can't serve normal RPC 
requests (we can use lifeline to avoid dead-node problem, so it's not count), 
so you propose to leave the SafeMode later. I'm not sure I fully understand 
your proposal, If there is any misunderstanding, please correct me

I think BlockReport Lease is also helpful in this case, it limits the 
concurrent block-reports and will significantly reduce the load of NameNode, 
which makes NameNode can process normal RPC at the same time
{quote}2. Avoid useless block report retry to reduce load of NameNode when 
restart since in my own experience, there are almost 30% block report ops retry 
from datanode, but it has processed in NameNode view.
{quote}
Using BlockReport Lease will also reduce the chance of block report retry, 
because DataNode only send FBR when NameNode grant lease to it
{quote}3. More fine-grained block than single disk of DataNode.
{quote}
I recently propose a JIRA(HDFS-14657) related with this work, the idea is quite 
simple and works very well on our production environment. Welcome comments and 
discussion
{quote}4. Improve efficiency of process block report RPC request when startup. 
(not discard timeout RPC, increase queue capacity, namenode trigger to failed 
report retry, etc.)
{quote}
Yep, I believe we can improve the efficiency of processing block report, do you 
have any clue on this?

 

 

> Avoid block report retry and slow down namenode startup
> -------------------------------------------------------
>
>                 Key: HDFS-14576
>                 URL: https://issues.apache.org/jira/browse/HDFS-14576
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> During namenode startup, the load will be very high since it has to process 
> every datanodes blockreport one by one. If there are hundreds datanodes block 
> reports pending process, the issue will be more serious even 
> #processFirstBlockReport is processed a lot more efficiently than ordinary 
> block reports. Then some of datanode will retry blockreport and lengthens 
> restart times. I think we should filter the block report request (via 
> datanode blockreport retries) which has be processed and return directly then 
> shorten down restart time. I want to state this proposal may be obvious only 
> for large cluster.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to