[ 
https://issues.apache.org/jira/browse/KAFKA-20582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081789#comment-18081789
 ] 

Nilesh Kumar commented on KAFKA-20582:
--------------------------------------

I will look into this issue.

> KRaft restart/restore instability when ACL enabled and CQ uses 
> allow.everyone.if.no.acl.found=true without super.users
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20582
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20582
>             Project: Kafka
>          Issue Type: Bug
>          Components: kraft
>         Environment: Kafka KRaft (controller quorum + brokers)
> ACL enabled (org.apache.kafka.metadata.authorizer.StandardAuthorizer)
> CONTROLLER listener secured (SASL_SSL/PLAIN in our case)
> Kubernetes restart/restore workflows
>            Reporter: Swathi Mocharla
>            Priority: Major
>
> In a KRaft deployment with {{StandardAuthorizer}} enabled, controller/broker 
> restart or restore flows become unstable when CQ is configured with:
>  * {{allow.everyone.if.no.acl.found=true}}
>  * no {{super.users}}
> Observed symptoms:
>  * repeated {{AuthorizerNotReadyException}} on CONTROLLER listener requests 
> during startup
>  * traffic broker init container loops and may stall in init state
>  * metadata quorum check failures during init:
>  ** {{UnsupportedVersionException: Direct-to-controller communication is not 
> supported with the current MetadataVersion}}
>  ** {{The remote node is not a CONTROLLER that supports the KIP-919 
> DESCRIBE_CLUSTER api}}
>  * traffic broker pod stuck in {{Init:1/4}} in affected runs
> In the same environment:
>  * {{allow.everyone.if.no.acl.found=false}} + proper {{super.users}} is stable
>  * ACL disabled is stable
> *Steps to Reproduce*
>  # Deploy KRaft cluster with ACL enabled.
>  # Configure CQ with {{{}allow.everyone.if.no.acl.found=true{}}}, no 
> {{{}super.users{}}}.
>  # Trigger restart/restore (controller and/or broker restart with init quorum 
> check).
>  # Observe CQ and broker-init logs.
> *Expected Result*
> Restart/restore should converge reliably without init deadlock.
> *Actual Result*
> Intermittent startup deadlock and repeated authorizer/quorum check errors; 
> broker init may remain stuck.
> *Workaround*
> Use:
>  * {{allow.everyone.if.no.acl.found=false}}
>  * explicit {{super.users}} for required CQ/TB principals
> *Relevant Log Fragments*
>  * {{org.apache.kafka.common.errors.AuthorizerNotReadyException}}
>  * {{UnsupportedVersionException: Direct-to-controller communication is not 
> supported with the current MetadataVersion}}
>  * {{The remote node is not a CONTROLLER that supports the KIP-919 
> DESCRIBE_CLUSTER api}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to