Shekhar Prasad Rajak created KAFKA-20691:
--------------------------------------------
Summary: PersisterStateBatchCombiner.combineStateBatches() is
O(n²) worst-case; replace TreeSet-based iterative merge with single-pass
sweep-line
Key: KAFKA-20691
URL: https://issues.apache.org/jira/browse/KAFKA-20691
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 4.2.1
Reporter: Shekhar Prasad Rajak
Assignee: Shekhar Prasad Rajak
Fix For: 4.3.1
combineStateBatches() merges batchesSoFar + newBatches into a sorted, disjoint,
(deliveryState, deliveryCount)-distinct cover clipped at SPSO. Pipeline:
pruneBatches drops/clips ranges below SPSO.
mergeBatches loads survivors into a TreeSet, repeatedly finds the first
overlapping or contiguous-same-state pair.
Improvement area :
Worst-case O(n ^ 2). Each merge step rescans the TreeSet from the start (O(n))
and does O(log n) remove/add. 10k contiguous same-state batches → 10k rescans
of a shrinking TreeSet. User-visible as non-linear WriteShareGroupState latency
when many small ack ranges accumulate per partition.
We can make it Time: O(n log n) (sort) + O(n) merge
How:
Replace the TreeSet + iterative findMergeCandidatePair loop with a single-pass
sweep-line over a sorted array:
1. Sort phase (O(n log n)): prune + sort into ArrayList
2. Sweep phase (O(n)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)