Swathi Mocharla created KAFKA-20582:
---------------------------------------
Summary: KRaft restart/restore instability when ACL enabled and CQ
uses allow.everyone.if.no.acl.found=true without super.users
Key: KAFKA-20582
URL: https://issues.apache.org/jira/browse/KAFKA-20582
Project: Kafka
Issue Type: Bug
Components: kraft
Environment: Kafka KRaft (controller quorum + brokers)
ACL enabled (org.apache.kafka.metadata.authorizer.StandardAuthorizer)
CONTROLLER listener secured (SASL_SSL/PLAIN in our case)
Kubernetes restart/restore workflows
Reporter: Swathi Mocharla
In a KRaft deployment with {{StandardAuthorizer}} enabled, controller/broker
restart or restore flows become unstable when CQ is configured with:
* {{allow.everyone.if.no.acl.found=true}}
* no {{super.users}}
Observed symptoms:
* repeated {{AuthorizerNotReadyException}} on CONTROLLER listener requests
during startup
* traffic broker init container loops and may stall in init state
* metadata quorum check failures during init:
** {{UnsupportedVersionException: Direct-to-controller communication is not
supported with the current MetadataVersion}}
** {{The remote node is not a CONTROLLER that supports the KIP-919
DESCRIBE_CLUSTER api}}
* traffic broker pod stuck in {{Init:1/4}} in affected runs
In the same environment:
* {{allow.everyone.if.no.acl.found=false}} + proper {{super.users}} is stable
* ACL disabled is stable
*Steps to Reproduce*
# Deploy KRaft cluster with ACL enabled.
# Configure CQ with {{{}allow.everyone.if.no.acl.found=true{}}}, no
{{{}super.users{}}}.
# Trigger restart/restore (controller and/or broker restart with init quorum
check).
# Observe CQ and broker-init logs.
*Expected Result*
Restart/restore should converge reliably without init deadlock.
*Actual Result*
Intermittent startup deadlock and repeated authorizer/quorum check errors;
broker init may remain stuck.
*Workaround*
Use:
* {{allow.everyone.if.no.acl.found=false}}
* explicit {{super.users}} for required CQ/TB principals
*Relevant Log Fragments*
* {{org.apache.kafka.common.errors.AuthorizerNotReadyException}}
* {{UnsupportedVersionException: Direct-to-controller communication is not
supported with the current MetadataVersion}}
* {{The remote node is not a CONTROLLER that supports the KIP-919
DESCRIBE_CLUSTER api}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)