Henrik created FLINK-12384:
------------------------------
Summary: Rolling the etcd servers causes "Connected to an old
server; r-o mode will be unavailable"
Key: FLINK-12384
URL: https://issues.apache.org/jira/browse/FLINK-12384
Project: Flink
Issue Type: Bug
Reporter: Henrik
{code:java}
[tm] 2019-05-01 13:30:53,316 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Initiating
client connection, connectString=analytics-zetcd:2181 sessionTimeout=60000
watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@5c8eee0f
[tm] 2019-05-01 13:30:53,384 WARN
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL
configuration failed: javax.security.auth.login.LoginException: No JAAS
configuration section named 'Client' was found in specified JAAS configuration
file: '/tmp/jaas-3674237213070587877.conf'. Will continue connection to
Zookeeper server without SASL authentication, if Zookeeper server allows it.
[tm] 2019-05-01 13:30:53,395 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Opening
socket connection to server
analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181
[tm] 2019-05-01 13:30:53,395 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Using
configured hostname/address for TaskManager: 10.1.2.173.
[tm] 2019-05-01 13:30:53,401 ERROR
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState -
Authentication failed
[tm] 2019-05-01 13:30:53,418 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start
actor system at 10.1.2.173:0
[tm] 2019-05-01 13:30:53,420 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Socket
connection established to
analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181, initiating session
[tm] 2019-05-01 13:30:53,500 WARN
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxnSocket -
Connected to an old server; r-o mode will be unavailable
[tm] 2019-05-01 13:30:53,500 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Session
establishment complete on server
analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181, sessionid =
0xbf06a739001d446, negotiated timeout = 60000
[tm] 2019-05-01 13:30:53,525 INFO
org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager
- State change: CONNECTED{code}
Repro:
Start an etcd-cluster, with e.g. etcd-operator, with three members. Start zetcd
in front. Configure the sesssion cluster to go against zetcd.
Ensure the job can start successfully.
Now, kill the etcd pods one by one, letting the quorum re-establish in between,
so that the cluster is still OK.
Now restart the job/tm pods. You'll end up in this no-mans-land.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)