[ 
https://issues.apache.org/jira/browse/HADOOP-16763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999663#comment-16999663
 ] 

Masatake Iwasaki edited comment on HADOOP-16763 at 12/19/19 2:33 AM:
---------------------------------------------------------------------

[~elgoiri], I should have set 
{{yarn.resourcemanager.ha.curator-leader-elector.enabled}} to {{true}} to 
reproduce the issue. I got the error below with zookeeper-3.5.6.jar on the 
classpath of RM:
{noformat}
2019-12-19 02:23:25,754 ERROR 
org.apache.curator.framework.recipes.leader.LeaderLatch: getChildren() failed. 
rc = -6
2019-12-19 02:23:25,853 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
SUSPENDED
2019-12-19 02:23:25,901 INFO 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
protocol 
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to 
the server
2019-12-19 02:23:25,915 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2019-12-19 02:23:25,916 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 8033: starting
2019-12-19 02:23:26,269 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server 98e7b66e95e3/172.18.0.11:2181. Will not attempt to 
authenticate using SASL (unknown error)
2019-12-19 02:23:26,270 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established, initiating session, client: /172.18.0.11:57042, server: 
98e7b66e95e3/172.18.0.11:2181
2019-12-19 02:23:26,282 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server 98e7b66e95e3/172.18.0.11:2181, sessionid = 
0x2000dc764480006, negotiated timeout = 10000
2019-12-19 02:23:26,282 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
RECONNECTED
2019-12-19 02:23:26,287 WARN org.apache.zookeeper.ClientCnxn: Session 
0x2000dc764480006 for server 98e7b66e95e3/172.18.0.11:2181, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 5 with err -6 expected Xid 4 for 
a packet with details: clientPath:/zookeeper/config 
serverPath:/zookeeper/config finished:false header:: 4,4  replyHeader:: 0,0,-4  
request:: '/zookeeper/config,T  response::  
        at 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:907)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
2019-12-19 02:23:26,391 INFO org.apache.curator.framework.state.ConnectionStateM
{noformat}
If I replace the zookeeper (client) jar on the classpath, it worked.
{noformat}
$ docker exec hadoop01 /hadoop/bin/hadoop classpath --glob | sed -z -e 
's/:/\n/g' | grep zookeeper
/zookeeper/zookeeper-3.4.14.jar

$ docker exec hadoop01 /hadoop/bin/yarn rmadmin -getServiceState rm1
2019-12-19 01:52:27,597 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
active
{noformat}
If both of zookeeper-3.4.14.jar and zookeeper-3.5.6.jar are on the classpath, I 
get another error below.
{noformat}
$ docker exec hadoop01 cat /hadoop/etc/hadoop/hadoop-env.sh | grep 
'^export.*CLASSPATH'
export HADOOP_CLASSPATH="/zookeeper/zookeeper-3.4.14.jar"
export HADOOP_USER_CLASSPATH_FIRST="yes"

$ docker exec hadoop01 /hadoop/bin/hadoop classpath --glob | sed -z -e 
's/:/\n/g' | grep zookeeper
/zookeeper/zookeeper-3.4.14.jar
/hadoop/share/hadoop/common/lib/zookeeper-3.5.6.jar
/hadoop/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar
/hadoop/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar
/hadoop/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar
{noformat}
{noformat}
2019-12-19 02:26:40,040 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManager
java.lang.NoSuchMethodError: 
org.apache.zookeeper.server.quorum.flexible.QuorumMaj.<init>(Ljava/util/Map;)V
        at 
org.apache.curator.framework.imps.EnsembleTracker.<init>(EnsembleTracker.java:57)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:159)
        at 
org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:165)
        at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:154)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndStartZKManager(ResourceManager.java:419)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createEmbeddedElector(ResourceManager.java:385)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:333)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
{noformat}


was (Author: iwasakims):
[~elgoiri], I should have set 
{{yarn.resourcemanager.ha.curator-leader-elector.enabled}} to {{true}} to 
reproduce the issue. I got the error below with zookeeper-3.5.6.jar on the 
classpath of RM:
{noformat}
2019-12-19 02:23:25,754 ERROR 
org.apache.curator.framework.recipes.leader.LeaderLatch: getChildren() failed. 
rc = -6
2019-12-19 02:23:25,853 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
SUSPENDED
2019-12-19 02:23:25,901 INFO 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
protocol 
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to 
the server
2019-12-19 02:23:25,915 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2019-12-19 02:23:25,916 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 8033: starting
2019-12-19 02:23:26,269 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server 98e7b66e95e3/172.18.0.11:2181. Will not attempt to 
authenticate using SASL (unknown error)
2019-12-19 02:23:26,270 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established, initiating session, client: /172.18.0.11:57042, server: 
98e7b66e95e3/172.18.0.11:2181
2019-12-19 02:23:26,282 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server 98e7b66e95e3/172.18.0.11:2181, sessionid = 
0x2000dc764480006, negotiated timeout = 10000
2019-12-19 02:23:26,282 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
RECONNECTED
2019-12-19 02:23:26,287 WARN org.apache.zookeeper.ClientCnxn: Session 
0x2000dc764480006 for server 98e7b66e95e3/172.18.0.11:2181, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 5 with err -6 expected Xid 4 for 
a packet with details: clientPath:/zookeeper/config 
serverPath:/zookeeper/config finished:false header:: 4,4  replyHeader:: 0,0,-4  
request:: '/zookeeper/config,T  response::  
        at 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:907)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
2019-12-19 02:23:26,391 INFO org.apache.curator.framework.state.ConnectionStateM
{noformat}

If I replace the zookeeper (client) jar on the classpath, it worked.
{noformat}
$ docker exec hadoop01 /hadoop/bin/hadoop classpath --glob | sed -z -e 
's/:/\n/g' | grep zookeeper
/zookeeper/zookeeper-3.4.14.jar

$ docker exec hadoop01 /hadoop/bin/yarn rmadmin -getServiceState rm1
2019-12-19 01:52:27,597 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
active
{noformat}

If both of zookeeper-3.4.14.jar and zookeeper-3.5.6.jar are on the classpath, I 
got the error above.
{noformat}
$ docker exec hadoop01 cat /hadoop/etc/hadoop/hadoop-env.sh | grep 
'^export.*CLASSPATH'
export HADOOP_CLASSPATH="/zookeeper/zookeeper-3.4.14.jar"
export HADOOP_USER_CLASSPATH_FIRST="yes"

$ docker exec hadoop01 /hadoop/bin/hadoop classpath --glob | sed -z -e 
's/:/\n/g' | grep zookeeper
/zookeeper/zookeeper-3.4.14.jar
/hadoop/share/hadoop/common/lib/zookeeper-3.5.6.jar
/hadoop/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar
/hadoop/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar
/hadoop/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar
{noformat}


> Make Curator 4 run in soft-compatibility mode with ZooKeeper 3.4
> ----------------------------------------------------------------
>
>                 Key: HADOOP-16763
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16763
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Íñigo Goiri
>            Priority: Major
>
> HADOOP-16579 changed Curator to 4.2 and ZooKeeper to 3.5.
> This change relate to the client libraries used by the components.
> However, the ensemble in most deployments is 3.4 (default in Ubuntu for 
> example).
> To allow this mode, there is a soft-compatibility mode described in 
> http://curator.apache.org/zk-compatibility.html
> We should enable this soft-compatibility mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to