[
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566216#comment-17566216
]
Suriya Vijayaraghavan commented on KAFKA-12901:
-----------------------------------------------
[~junrao]
Not completely, but yes, I did see error the message
{code:java}
-> ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient){code}
This issue got resolved for us, after adding the JVM argument
-Dzookeeper.sasl.client=false
We enabled JASS for client & broker communication and inter-broker
communication. Hence we had to disable this explicitly to avoid getting Auth
failed
> Metadata not updated after broker restart.
> ------------------------------------------
>
> Key: KAFKA-12901
> URL: https://issues.apache.org/jira/browse/KAFKA-12901
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.8.0
> Reporter: Suriya Vijayaraghavan
> Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we
> upgraded in our production setup (as we didn't enable Kraft we went ahead),
> we faced TimeoutException in our clients after few weeks in our production
> setup. We tried to list all active brokers using admin client API, all
> brokers were listed properly. So we logged into that broker and tried to do a
> describe topic with localhost as bootstrap-server, but we got timeout as
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the
> controlled shutdown got failed (got unknown server error response from the
> controller), and proceeded to unclean shutdown. Still the process didn't get
> quit but the process didnt process any other operation as well. And this did
> not remove the broker from alive status for hours (able to see this broker in
> list of brokers) and our clients were still trying to contact this broker and
> failing with timeout exception. So we tried restarting the problematic
> broker, but we faced unknown topic or partition issue in our client after the
> restart which caused timeout as well. We noticed that metadata was not
> loaded. So we had to restart our controller. And after restarting the
> controller everthing got back to normal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)