[
https://issues.apache.org/jira/browse/HDFS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zengyongping updated HDFS-9126:
-------------------------------
Environment:
OS:Centos 6.5(final)
Hadoop:2.6.0
namenode ha base 5 journalnodes
was:
OS:Centos 6.5(final)
Hadoop:2.6.0
namenode ha base 5 journalnode
> namenode crash in fsimage download/transfer
> -------------------------------------------
>
> Key: HDFS-9126
> URL: https://issues.apache.org/jira/browse/HDFS-9126
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.0
> Environment: OS:Centos 6.5(final)
> Hadoop:2.6.0
> namenode ha base 5 journalnodes
> Reporter: zengyongping
> Priority: Critical
>
> In our product Hadoop cluster,when active namenode begin download/transfer
> fsimage from standby namenode.some times zkfc monitor health of NameNode
> socket timeout,zkfs judge active namenode status SERVICE_NOT_RESPONDING
> ,happen hadoop namenode ha failover,fence old active namenode.
> zkfc logs:
> 2015-09-24 11:44:44,739 WARN org.apache.hadoop.ha.HealthMonitor:
> Transport-level exception trying to monitor health of NameNode at
> hostname1/192.168.10.11:8020: Call From hostname1/192.168.10.11 to
> hostname1:8020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 45000 millis timeout while waiting for
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
> local=/192.168.10.11:22614 remote=hostname1/192.168.10.11:8020]; For more
> details see: http://wiki.apache.org/hadoop/SocketTimeout
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.HealthMonitor: Entering
> state SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: Local
> service NameNode at hostname1/192.168.10.11:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hostname1/192.168.10.11:8020 and
> marking that fencing is necessary
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-09-24 11:44:44,761 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x54d81348fe503e3 closed
> 2015-09-24 11:44:44,761 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x54d81348fe503e3
> 2015-09-24 11:44:44,764 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)