Swathi Mocharla created KAFKA-20582:
---------------------------------------

             Summary: KRaft restart/restore instability when ACL enabled and CQ 
uses allow.everyone.if.no.acl.found=true without super.users
                 Key: KAFKA-20582
                 URL: https://issues.apache.org/jira/browse/KAFKA-20582
             Project: Kafka
          Issue Type: Bug
          Components: kraft
         Environment: Kafka KRaft (controller quorum + brokers)
ACL enabled (org.apache.kafka.metadata.authorizer.StandardAuthorizer)
CONTROLLER listener secured (SASL_SSL/PLAIN in our case)
Kubernetes restart/restore workflows
            Reporter: Swathi Mocharla


In a KRaft deployment with {{StandardAuthorizer}} enabled, controller/broker 
restart or restore flows become unstable when CQ is configured with:
 * {{allow.everyone.if.no.acl.found=true}}
 * no {{super.users}}

Observed symptoms:
 * repeated {{AuthorizerNotReadyException}} on CONTROLLER listener requests 
during startup
 * traffic broker init container loops and may stall in init state
 * metadata quorum check failures during init:
 ** {{UnsupportedVersionException: Direct-to-controller communication is not 
supported with the current MetadataVersion}}
 ** {{The remote node is not a CONTROLLER that supports the KIP-919 
DESCRIBE_CLUSTER api}}
 * traffic broker pod stuck in {{Init:1/4}} in affected runs

In the same environment:
 * {{allow.everyone.if.no.acl.found=false}} + proper {{super.users}} is stable
 * ACL disabled is stable



*Steps to Reproduce*
 # Deploy KRaft cluster with ACL enabled.
 # Configure CQ with {{{}allow.everyone.if.no.acl.found=true{}}}, no 
{{{}super.users{}}}.
 # Trigger restart/restore (controller and/or broker restart with init quorum 
check).
 # Observe CQ and broker-init logs.

*Expected Result*
Restart/restore should converge reliably without init deadlock.

*Actual Result*
Intermittent startup deadlock and repeated authorizer/quorum check errors; 
broker init may remain stuck.

*Workaround*
Use:
 * {{allow.everyone.if.no.acl.found=false}}
 * explicit {{super.users}} for required CQ/TB principals

*Relevant Log Fragments*
 * {{org.apache.kafka.common.errors.AuthorizerNotReadyException}}
 * {{UnsupportedVersionException: Direct-to-controller communication is not 
supported with the current MetadataVersion}}
 * {{The remote node is not a CONTROLLER that supports the KIP-919 
DESCRIBE_CLUSTER api}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to