[
https://issues.apache.org/jira/browse/KAFKA-20691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Bejeck updated KAFKA-20691:
--------------------------------
Fix Version/s: 4.3.2
(was: 4.3.1)
> PersisterStateBatchCombiner.combineStateBatches() is O(n^2) worst-case;
> replace TreeSet-based iterative merge with single-pass sweep-line
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20691
> URL: https://issues.apache.org/jira/browse/KAFKA-20691
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 4.2.1
> Reporter: Shekhar Prasad Rajak
> Assignee: Shekhar Prasad Rajak
> Priority: Major
> Labels: queue, queues-for-kafka
> Fix For: 4.4.0, 4.3.2
>
>
> combineStateBatches() merges batchesSoFar + newBatches into a sorted,
> disjoint, (deliveryState, deliveryCount)-distinct cover clipped at SPSO.
> Pipeline:
> pruneBatches drops/clips ranges below SPSO.
> mergeBatches loads survivors into a TreeSet, repeatedly finds the first
> overlapping or contiguous-same-state pair.
> Improvement area :
> iterative pairwise TreeSet merging. Simple contiguous same-state input is
> roughly O(n log n), but overlap-heavy inputs can trigger repeated
> split/reinsert rounds. The number of merge iterations is data-dependent and
> can grow non-linearly, approaching O(n^2 log n) in pathological overlap cases.
>
> User-visible as non-linear WriteShareGroupState latency when many small ack
> ranges accumulate per partition.
> We can make it Time: O(n log n) (sort) + O ( n ) merge
> How:
> Replace the TreeSet + iterative findMergeCandidatePair loop with a
> single-pass sweep-line over a sorted array:
> 1. Sort phase (O(n log n)): prune + sort into ArrayList
> 2. Sweep phase (O( n )
--
This message was sent by Atlassian Jira
(v8.20.10#820010)