[ https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634579#action_12634579 ]
Hairong Kuang commented on HADOOP-4116: --------------------------------------- Yes, I agree. I did similar stuff in HADOOP-2188 in the context of IPC. For this issue, I do not want to take the effort to add a Ping interface to DataNode. I will use KeepAlive for now. The failed unit test seems not related to this jira. > Balancer should provide better resource management > -------------------------------------------------- > > Key: HADOOP-4116 > URL: https://issues.apache.org/jira/browse/HADOOP-4116 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.0 > Reporter: Raghu Angadi > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.18.2, 0.19.0 > > Attachments: balancerRM.patch, balancerRM1.patch, > balancerRM2-b18.patch, balancerRM2.patch > > > The number of threads are currently limited on datanodes. Once these threads > are occupied, DataNode does not accept any more requests (DOS). Recently we > saw a case where most of the 256 threads were waiting in > {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since > rebalancing is (heavily) throttled, I would think this would be the common > case. > These operations waiting for active rebalancing threads to finish need not > take up a thread. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.