[
https://issues.apache.org/jira/browse/HDFS-13828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amithsha updated HDFS-13828:
----------------------------
Issue Type: Bug (was: Task)
> DataNode breaching Xceiver Count
> --------------------------------
>
> Key: HDFS-13828
> URL: https://issues.apache.org/jira/browse/HDFS-13828
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.1
> Reporter: Amithsha
> Priority: Critical
>
> We were observing the breach of the xceiver count 4096, On a particular set
> of nodes from 5 - 8 nodes in a 900 nodes cluster.
> And we stopped the datanode services on those nodes and made to replicate
> across the cluster. After that also, we observed the same issue on a new set
> of nodes.
> Q1: Why on a particular node, and also after decommissioning the node the
> data should be replicated across the cluster, But why again difference set of
> node?
> Assumptions :
> Reading a particular block/ data on that node might be the cause for this but
> it should be mitigated after the decommission but not why? So suspected that
> those MR jobs are triggered from Hive, so the query might be referring to the
> same block mulitple times in different stages and creating this issue?
> From Thread Dump :
> Thread dump of datanode says that out of 4090+ xceiver threads created on
> that node nearly 4000+ where belong to the same AppId of multiple mappers
> with state no operation.
>
> Any suggestions on this?
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]