[
https://issues.apache.org/jira/browse/KAFKA-12162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Badai Aqrandista updated KAFKA-12162:
-------------------------------------
Description:
We found that Kafka broker continued to run after it failed to create
"/brokers/ids/X" znode and still acted as a partition leader. This should've
been fixed in https://issues.apache.org/jira/browse/KAFKA-7165 but there might
be a corner case that was not handled here.
Here's the log snippet.
{code:java}
[2020-12-08 14:10:25,040] INFO Client successfully logged in.
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism.
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,040] INFO Opening socket connection to server
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,045] INFO Socket connection established, initiating
session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service,
session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service,
session 0x50000002ed5001b has expired, closing socket connection
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO EventThread shut down for session:
0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired.
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a
new session to
kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO Initiating client connection,
connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
sessionTimeout=22500
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f
(org.apache.zookeeper.ZooKeeper)
[2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes
(org.apache.zookeeper.ClientCnxnSocket)
[2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature
enabled= (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,061] INFO Client successfully logged in.
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism.
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,061] INFO Opening socket connection to server
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,065] INFO Socket connection established, initiating
session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true)
(kafka.zk.KafkaZkClient)
[2020-12-08 14:10:25,081] INFO Session establishment complete on server
kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated
timeout = 22500 (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at
/brokers/ids/2152, node already exists and owner '288459910057232437' does not
match current session '288340729659195393'
(kafka.zk.KafkaZkClient$CheckedEphemeral)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
Got error produce response with correlation id 520478 on topic-partition
_confluent-metrics-9, retrying (9 attempts left). Error:
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
Received invalid metadata error in produce request on partition
_confluent-metrics-9 due to
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition.. Going to request metadata update now
(org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter]
Got error produce response with correlation id 520480 on topic-partition
_confluent-metrics-4, retrying (9 attempts left). Error:
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
{code}
was:
We found that Kafka broker continued to run after it failed to create
"/brokers/ids/X" znode and still acted as a partition leader. Here's the log
snippet.
{code:java}
[2020-12-08 14:10:25,040] INFO Client successfully logged in.
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism.
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,040] INFO Opening socket connection to server
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,045] INFO Socket connection established, initiating
session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service,
session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service,
session 0x50000002ed5001b has expired, closing socket connection
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO EventThread shut down for session:
0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired.
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a
new session to
kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO Initiating client connection,
connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
sessionTimeout=22500
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f
(org.apache.zookeeper.ZooKeeper)
[2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes
(org.apache.zookeeper.ClientCnxnSocket)
[2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature
enabled= (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,061] INFO Client successfully logged in.
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism.
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,061] INFO Opening socket connection to server
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,065] INFO Socket connection established, initiating
session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true)
(kafka.zk.KafkaZkClient)
[2020-12-08 14:10:25,081] INFO Session establishment complete on server
kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated
timeout = 22500 (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at
/brokers/ids/2152, node already exists and owner '288459910057232437' does not
match current session '288340729659195393'
(kafka.zk.KafkaZkClient$CheckedEphemeral)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
Got error produce response with correlation id 520478 on topic-partition
_confluent-metrics-9, retrying (9 attempts left). Error:
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
Received invalid metadata error in produce request on partition
_confluent-metrics-9 due to
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition.. Going to request metadata update now
(org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter]
Got error produce response with correlation id 520480 on topic-partition
_confluent-metrics-4, retrying (9 attempts left). Error:
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
{code}
> Kafka broker continued to run after failing to create "/brokers/ids/X" znode.
> -----------------------------------------------------------------------------
>
> Key: KAFKA-12162
> URL: https://issues.apache.org/jira/browse/KAFKA-12162
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.4.1
> Reporter: Badai Aqrandista
> Priority: Major
>
> We found that Kafka broker continued to run after it failed to create
> "/brokers/ids/X" znode and still acted as a partition leader. This should've
> been fixed in https://issues.apache.org/jira/browse/KAFKA-7165 but there
> might be a corner case that was not handled here.
>
> Here's the log snippet.
>
>
> {code:java}
> [2020-12-08 14:10:25,040] INFO Client successfully logged in.
> (org.apache.zookeeper.Login)
> [2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism.
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2020-12-08 14:10:25,040] INFO Opening socket connection to server
> kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
> Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,045] INFO Socket connection established, initiating
> session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service,
> session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service,
> session 0x50000002ed5001b has expired, closing socket connection
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,055] INFO EventThread shut down for session:
> 0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session
> expired. (kafka.zookeeper.ZooKeeperClient)
> [2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a
> new session to
> kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
> (kafka.zookeeper.ZooKeeperClient)
> [2020-12-08 14:10:25,059] INFO Initiating client connection,
> connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
> sessionTimeout=22500
> watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f
> (org.apache.zookeeper.ZooKeeper)
> [2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes
> (org.apache.zookeeper.ClientCnxnSocket)
> [2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature
> enabled= (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,061] INFO Client successfully logged in.
> (org.apache.zookeeper.Login)
> [2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism.
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2020-12-08 14:10:25,061] INFO Opening socket connection to server
> kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using
> Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,065] INFO Socket connection established, initiating
> session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure?
> true) (kafka.zk.KafkaZkClient)
> [2020-12-08 14:10:25,081] INFO Session establishment complete on server
> kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001,
> negotiated timeout = 22500 (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at
> /brokers/ids/2152, node already exists and owner '288459910057232437' does
> not match current session '288340729659195393'
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
> Got error produce response with correlation id 520478 on topic-partition
> _confluent-metrics-9, retrying (9 attempts left). Error:
> NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
> [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter]
> Received invalid metadata error in produce request on partition
> _confluent-metrics-9 due to
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
> not the leader for that topic-partition.. Going to request metadata update
> now (org.apache.kafka.clients.producer.internals.Sender)
> [2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter]
> Got error produce response with correlation id 520480 on topic-partition
> _confluent-metrics-4, retrying (9 attempts left). Error:
> NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)