[
https://issues.apache.org/jira/browse/KAFKA-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ibenchhida updated KAFKA-20651:
-------------------------------
Description:
In KRaft mode, StandardAuthorizerData.findResult() calls acl.kafkaPrincipal()
for every ACL visited during authorization. Despite being called repeatedly
for the same principal strings (e.g., "User:alice"), kafkaPrincipal() parses
the principal string from scratch on each invocation:
public KafkaPrincipal kafkaPrincipal() {
int colonIndex = principal.indexOf(":");
String principalType = principal.substring(0, colonIndex); // alloc 1
String principalName = principal.substring(colonIndex + 1); // alloc 2
return new KafkaPrincipal(principalType, principalName); // alloc 3
}
With a large number of ACLs and repeated authorization requests, this
generates millions of transient String + KafkaPrincipal allocations,
creating unnecessary CPU and GC pressure.
was:
*StandardAuthorizerData.checkSection()* can enter an infinite loop when
iterating over ACLs, causing request handler threads to spin at 100% CPU
indefinitely. The broker becomes unresponsive (metadata timeouts, file
descriptor leaks, memory growth) because the handler thread never returns.
*Root Cause*
The loop in checkSection iterates over a NavigableSet.tailSet(exemplar, true)
of sorted ACLs. It narrows the search range on each iteration by computing a
common prefix length (matchesUpTo) between the queried resource name and the
current ACL's resource name, then creating a new exemplar with that shortened
prefix.
_The bug:_ When matchesUpTo equals the length of exemplar.resourceName() (i.e.,
the queried resource is a prefix of the ACL resource name, e.g. queried
"foobar" vs ACL "foobar-A"), newPrefix = exemplar.resourceName().substring(0,
matchesUpTo) produces the same string as the original exemplar. The subsequent
tailSet(exemplar, true) restarts from the same exemplar, and the first ACL in
the iterator is the same one just processed — infinite loop.
> StandardAuthorizer: cache KafkaPrincipal in StandardAcl to eliminate
> allocation hotspot
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-20651
> URL: https://issues.apache.org/jira/browse/KAFKA-20651
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 3.9.2
> Environment: KRaft-based clusters using StandardAuthorizer (3.7.x,
> 3.8.x, 3.9.x, 4.0.x — all versions with the current checkSection
> implementation)
> Reporter: ibenchhida
> Priority: Critical
>
> In KRaft mode, StandardAuthorizerData.findResult() calls acl.kafkaPrincipal()
> for every ACL visited during authorization. Despite being called repeatedly
> for the same principal strings (e.g., "User:alice"), kafkaPrincipal() parses
> the principal string from scratch on each invocation:
> public KafkaPrincipal kafkaPrincipal() {
> int colonIndex = principal.indexOf(":");
> String principalType = principal.substring(0, colonIndex); // alloc 1
> String principalName = principal.substring(colonIndex + 1); // alloc 2
> return new KafkaPrincipal(principalType, principalName); // alloc
> 3
> }
> With a large number of ACLs and repeated authorization requests, this
> generates millions of transient String + KafkaPrincipal allocations,
> creating unnecessary CPU and GC pressure.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)