jeqo commented on code in PR #15324: URL: https://github.com/apache/kafka/pull/15324#discussion_r1482430278
########## storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerStateManager.java: ########## @@ -177,7 +177,7 @@ private void addProducerId(long producerId, ProducerStateEntry entry) { } private void removeProducerIds(List<Long> keys) { - producers.keySet().removeAll(keys); + keys.forEach(producers.keySet()::remove); Review Comment: interesting, seems to be related. Attaching the flamegraph on high cpu utilization to spot the root method. Looking at `AbstractSet#remoteAll` implementation: ``` public boolean removeAll(Collection<?> c) { Objects.requireNonNull(c); boolean modified = false; if (size() > c.size()) { for (Object e : c) modified |= remove(e); } else { for (Iterator<?> i = iterator(); i.hasNext(); ) { if (c.contains(i.next())) { i.remove(); modified = true; } } } return modified; } ``` Seems that in my case it's hitting the second branch, as it's burning on `AbstractList#contains`. For the expiration removal to hit the second branch means the size of expired keys is the same as the size of producers (cannot be higher). This seems to be possible, as we have got this issue even when no producers were running (so no new producer ids created), but when rebalancing the cluster (ie. old producer id snapshots loaded). In hindsight, the JDK implementation may have considered extending the first condition to include `c.size <= size()` scenario, as it will not depend on the collections `remove` implementation. On the other hand, if it would used a `HashSet keys` instead of current `ArrayList` type, it would not pretty much the same as the proposed fix. btw, will be simplyfing the expression even further to: ``` keys.forEach(producers::remove); ``` both lead to same `HashMap#remove` implementation at the end. We could even consider: if the size of expired producer ids it's the same as all producer ids, then we could consider to clean it all up instead of removing, as the source of expired ids is the same as producer. Something like: ``` if (keys.size() == producers.size()) { clearProducerIds(); } else { keys.forEach(producers::remove); producerIdCount = producers.size(); } ``` but performance-wise, execution time is pretty much the same (linear, de-referencing each key) as to the fix version; and readability doesn't improve much. PS, using jdk17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org