[ 
https://issues.apache.org/jira/browse/KAFKA-12162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Badai Aqrandista updated KAFKA-12162:
-------------------------------------
    Description: 
We found that Kafka broker continued to run after it failed to create 
"/brokers/ids/X" znode and still acted as a partition leader. This should've 
been fixed in https://issues.apache.org/jira/browse/KAFKA-7165 but there might 
be a corner case that was not handled here.

 

Here's the log snippet.

 

 
{code:java}
[2020-12-08 14:10:25,040] INFO Client successfully logged in. 
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism. 
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,040] INFO Opening socket connection to server 
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,045] INFO Socket connection established, initiating 
session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service, 
session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service, 
session 0x50000002ed5001b has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO EventThread shut down for session: 
0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a 
new session to 
kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
 (kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO Initiating client connection, 
connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
 sessionTimeout=22500 
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f 
(org.apache.zookeeper.ZooKeeper)
[2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes 
(org.apache.zookeeper.ClientCnxnSocket)
[2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature 
enabled= (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,061] INFO Client successfully logged in. 
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism. 
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,061] INFO Opening socket connection to server 
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,065] INFO Socket connection established, initiating 
session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true) 
(kafka.zk.KafkaZkClient)
[2020-12-08 14:10:25,081] INFO Session establishment complete on server 
kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated 
timeout = 22500 (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at 
/brokers/ids/2152, node already exists and owner '288459910057232437' does not 
match current session '288340729659195393' 
(kafka.zk.KafkaZkClient$CheckedEphemeral)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
Got error produce response with correlation id 520478 on topic-partition 
_confluent-metrics-9, retrying (9 attempts left). Error: 
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
Received invalid metadata error in produce request on partition 
_confluent-metrics-9 due to 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.. Going to request metadata update now 
(org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter] 
Got error produce response with correlation id 520480 on topic-partition 
_confluent-metrics-4, retrying (9 attempts left). Error: 
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
{code}
 

  was:
We found that Kafka broker continued to run after it failed to create 
"/brokers/ids/X" znode and still acted as a partition leader. Here's the log 
snippet.

 

 
{code:java}
[2020-12-08 14:10:25,040] INFO Client successfully logged in. 
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism. 
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,040] INFO Opening socket connection to server 
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,045] INFO Socket connection established, initiating 
session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service, 
session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service, 
session 0x50000002ed5001b has expired, closing socket connection 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO EventThread shut down for session: 
0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session expired. 
(kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a 
new session to 
kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
 (kafka.zookeeper.ZooKeeperClient)
[2020-12-08 14:10:25,059] INFO Initiating client connection, 
connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
 sessionTimeout=22500 
watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f 
(org.apache.zookeeper.ZooKeeper)
[2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes 
(org.apache.zookeeper.ClientCnxnSocket)
[2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature 
enabled= (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,061] INFO Client successfully logged in. 
(org.apache.zookeeper.Login)
[2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism. 
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2020-12-08 14:10:25,061] INFO Opening socket connection to server 
kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,065] INFO Socket connection established, initiating 
session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181 
(org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? true) 
(kafka.zk.KafkaZkClient)
[2020-12-08 14:10:25,081] INFO Session establishment complete on server 
kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, negotiated 
timeout = 22500 (org.apache.zookeeper.ClientCnxn)
[2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at 
/brokers/ids/2152, node already exists and owner '288459910057232437' does not 
match current session '288340729659195393' 
(kafka.zk.KafkaZkClient$CheckedEphemeral)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
Got error produce response with correlation id 520478 on topic-partition 
_confluent-metrics-9, retrying (9 attempts left). Error: 
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
Received invalid metadata error in produce request on partition 
_confluent-metrics-9 due to 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.. Going to request metadata update now 
(org.apache.kafka.clients.producer.internals.Sender)
[2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter] 
Got error produce response with correlation id 520480 on topic-partition 
_confluent-metrics-4, retrying (9 attempts left). Error: 
NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
{code}
 


> Kafka broker continued to run after failing to create "/brokers/ids/X" znode.
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-12162
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12162
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.1
>            Reporter: Badai Aqrandista
>            Priority: Major
>
> We found that Kafka broker continued to run after it failed to create 
> "/brokers/ids/X" znode and still acted as a partition leader. This should've 
> been fixed in https://issues.apache.org/jira/browse/KAFKA-7165 but there 
> might be a corner case that was not handled here.
>  
> Here's the log snippet.
>  
>  
> {code:java}
> [2020-12-08 14:10:25,040] INFO Client successfully logged in. 
> (org.apache.zookeeper.Login)
> [2020-12-08 14:10:25,040] INFO Client will use DIGEST-MD5 as SASL mechanism. 
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2020-12-08 14:10:25,040] INFO Opening socket connection to server 
> kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
> Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,045] INFO Socket connection established, initiating 
> session, client: /0.0.0.0:36056, server: kafkaprzk4.example.com/0.0.0.0:2181 
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,054] WARN Unable to reconnect to ZooKeeper service, 
> session 0x50000002ed5001b has expired (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,054] INFO Unable to reconnect to ZooKeeper service, 
> session 0x50000002ed5001b has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,055] INFO EventThread shut down for session: 
> 0x50000002ed5001b (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,055] INFO [ZooKeeperClient Kafka server] Session 
> expired. (kafka.zookeeper.ZooKeeperClient)
> [2020-12-08 14:10:25,059] INFO [ZooKeeperClient Kafka server] Initializing a 
> new session to 
> kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01.
>  (kafka.zookeeper.ZooKeeperClient)
> [2020-12-08 14:10:25,059] INFO Initiating client connection, 
> connectString=kafkaprzk1.example.com:2181,kafkaprzk2.example.com:2181,kafkaprzk3.example.com:2181,kafkaprzk4.example.com:2181,kafkaprzk5.example.com:2181/kafkaprod/01
>  sessionTimeout=22500 
> watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1b11171f 
> (org.apache.zookeeper.ZooKeeper)
> [2020-12-08 14:10:25,060] INFO jute.maxbuffer value is 4194304 Bytes 
> (org.apache.zookeeper.ClientCnxnSocket)
> [2020-12-08 14:10:25,060] INFO zookeeper.request.timeout value is 0. feature 
> enabled= (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,061] INFO Client successfully logged in. 
> (org.apache.zookeeper.Login)
> [2020-12-08 14:10:25,061] INFO Client will use DIGEST-MD5 as SASL mechanism. 
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2020-12-08 14:10:25,061] INFO Opening socket connection to server 
> kafkaprzk4.example.com/0.0.0.0:2181. Will attempt to SASL-authenticate using 
> Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,065] INFO Socket connection established, initiating 
> session, client: /0.0.0.0:36058, server: kafkaprzk4.example.com/0.0.0.0:2181 
> (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,070] INFO Creating /brokers/ids/2152 (is it secure? 
> true) (kafka.zk.KafkaZkClient)
> [2020-12-08 14:10:25,081] INFO Session establishment complete on server 
> kafkaprzk4.example.com/0.0.0.0:2181, sessionid = 0x400645dad9d0001, 
> negotiated timeout = 22500 (org.apache.zookeeper.ClientCnxn)
> [2020-12-08 14:10:25,112] ERROR Error while creating ephemeral at 
> /brokers/ids/2152, node already exists and owner '288459910057232437' does 
> not match current session '288340729659195393' 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
> Got error produce response with correlation id 520478 on topic-partition 
> _confluent-metrics-9, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
> [2020-12-08 14:10:29,814] WARN [Producer clientId=confluent-metrics-reporter] 
> Received invalid metadata error in produce request on partition 
> _confluent-metrics-9 due to 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Going to request metadata update 
> now (org.apache.kafka.clients.producer.internals.Sender)
> [2020-12-08 14:10:29,818] WARN [Producer clientId=confluent-metrics-reporter] 
> Got error produce response with correlation id 520480 on topic-partition 
> _confluent-metrics-4, retrying (9 attempts left). Error: 
> NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to