[
https://issues.apache.org/jira/browse/KAFKA-20582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081789#comment-18081789
]
Nilesh Kumar commented on KAFKA-20582:
--------------------------------------
I will look into this issue.
> KRaft restart/restore instability when ACL enabled and CQ uses
> allow.everyone.if.no.acl.found=true without super.users
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20582
> URL: https://issues.apache.org/jira/browse/KAFKA-20582
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Environment: Kafka KRaft (controller quorum + brokers)
> ACL enabled (org.apache.kafka.metadata.authorizer.StandardAuthorizer)
> CONTROLLER listener secured (SASL_SSL/PLAIN in our case)
> Kubernetes restart/restore workflows
> Reporter: Swathi Mocharla
> Priority: Major
>
> In a KRaft deployment with {{StandardAuthorizer}} enabled, controller/broker
> restart or restore flows become unstable when CQ is configured with:
> * {{allow.everyone.if.no.acl.found=true}}
> * no {{super.users}}
> Observed symptoms:
> * repeated {{AuthorizerNotReadyException}} on CONTROLLER listener requests
> during startup
> * traffic broker init container loops and may stall in init state
> * metadata quorum check failures during init:
> ** {{UnsupportedVersionException: Direct-to-controller communication is not
> supported with the current MetadataVersion}}
> ** {{The remote node is not a CONTROLLER that supports the KIP-919
> DESCRIBE_CLUSTER api}}
> * traffic broker pod stuck in {{Init:1/4}} in affected runs
> In the same environment:
> * {{allow.everyone.if.no.acl.found=false}} + proper {{super.users}} is stable
> * ACL disabled is stable
> *Steps to Reproduce*
> # Deploy KRaft cluster with ACL enabled.
> # Configure CQ with {{{}allow.everyone.if.no.acl.found=true{}}}, no
> {{{}super.users{}}}.
> # Trigger restart/restore (controller and/or broker restart with init quorum
> check).
> # Observe CQ and broker-init logs.
> *Expected Result*
> Restart/restore should converge reliably without init deadlock.
> *Actual Result*
> Intermittent startup deadlock and repeated authorizer/quorum check errors;
> broker init may remain stuck.
> *Workaround*
> Use:
> * {{allow.everyone.if.no.acl.found=false}}
> * explicit {{super.users}} for required CQ/TB principals
> *Relevant Log Fragments*
> * {{org.apache.kafka.common.errors.AuthorizerNotReadyException}}
> * {{UnsupportedVersionException: Direct-to-controller communication is not
> supported with the current MetadataVersion}}
> * {{The remote node is not a CONTROLLER that supports the KIP-919
> DESCRIBE_CLUSTER api}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)