[ https://issues.apache.org/jira/browse/KAFKA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ron Dagostino resolved KAFKA-14890. ----------------------------------- Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/KAFKA-14887 > Kafka initiates shutdown due to connectivity problem with Zookeeper and > FatalExitError from ChangeNotificationProcessorThread > ----------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-14890 > URL: https://issues.apache.org/jira/browse/KAFKA-14890 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.3.2 > Reporter: Denis Razuvaev > Priority: Major > > Hello, > We have faced several times the deadlock in Kafka, the similar issue is - > https://issues.apache.org/jira/browse/KAFKA-13544 > The question - is it expected behavior that Kafka decided to shut down due to > connectivity problems with Zookeeper? Seems like it is related to the > inability to read data from */feature* Zk node and the > _ZooKeeperClientExpiredException_ thrown from _ZooKeeperClient_ class. This > exception is thrown and it is caught only in catch block of _doWork()_ method > in {_}ChangeNotificationProcessorThread{_}, and it leads to > {_}FatalExitError{_}. > This problem with shutdown is reproduced in the new versions of Kafka (which > already have fix regarding deadlock from 13544). > It is hard to write a synthetic test to reproduce problem, but it can be > reproduced locally via debug mode with the following steps: > 1) Start Zookeeper and start Kafka in debug mode. > 2) Emulate connectivity problem between Kafka and Zookeeper, for example > connection can be closed via Netcrusher library. > 3) Put a breakpoint in _updateLatestOrThrow()_ method in > _FeatureCacheUpdater_ class, before > _zkClient.getDataAndVersion(featureZkNodePath)_ line execution. > 4) Restore connection between Kafka and Zookeeper after session expiration. > Kafka execution should be stopped on the breakpoint. > 5) Resume execution until Kafka starts to execute line > _zooKeeperClient.handleRequests(remainingRequests)_ in > _retryRequestsUntilConnected_ method in _KafkaZkClient_ class. > 6) Again emulate connectivity problem between Kafka and Zookeeper and wait > until session will be expired. > 7) Restore connection between Kafka and Zookeeper. > 8) Kafka begins shutdown process, due to: > _ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK > node change event. The broker will eventually exit. > (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)_ > > The following problems on the real environment can be caused by some network > problems and periodic disconnection and connection to the Zookeeper in a > short time period. > I started mail thread in > [https://lists.apache.org/thread/gbk4scwd8g7mg2tfsokzj5tjgrjrb9dw] regarding > this problem, but have no answers. > For me it seems like defect, because Kafka initiates shutdown after restoring > connection between Kafka and Zookeeper, and should be fixed. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)