[
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927314#comment-16927314
]
Chen Zhang commented on HDFS-12288:
-----------------------------------
Hi [~shahrs87] [~elgoiri], do you have time to take a look? I changed the code
according previous discussion, and uploaded patch v3, it's not a complete
patch, only a draft without tests.
{quote}The method {{DataNode#getActiveNumberOfThreads()}} will be return the
sum of {{new DataNode#getXceiverCount() * 2}} + {{Num of Block recovery
threads}}.
We just need to have another metric or member variable to track currently
running {{Block recovery threads}}.
The reason we have multiplier of 2 is for every {{Dataxceiver}} thread, we also
create {{Packet Responder thread}}
{quote}
Actually not all the DataXceiver thread creates PacketResponder thread, only
the xceiver processing WRITE_BLOCK operation will create a PacketResponder
thread, so I added 2 additional metrics: {{dataNodePacketResponderCount}} and
{{dataNodeBlockRecoveryWorkerCount}}
> Fix DataNode's xceiver count calculation
> ----------------------------------------
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs
> Reporter: Lukas Majercak
> Assignee: Lukas Majercak
> Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch,
> HDFS-12288.003.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is
> only a very rough estimate, and in reality returns the total number of
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value
> which only accounts for actual number of DataXcevier threads currently
> running and thus represents the load on the DN much better.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]