[ 
https://issues.apache.org/jira/browse/KAFKA-20691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-20691:
--------------------------------
    Fix Version/s: 4.3.2
                       (was: 4.3.1)

> PersisterStateBatchCombiner.combineStateBatches() is O(n^2) worst-case; 
> replace TreeSet-based iterative merge with single-pass sweep-line
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20691
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20691
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.2.1
>            Reporter: Shekhar Prasad Rajak
>            Assignee: Shekhar Prasad Rajak
>            Priority: Major
>              Labels: queue, queues-for-kafka
>             Fix For: 4.4.0, 4.3.2
>
>
> combineStateBatches() merges batchesSoFar + newBatches into a sorted, 
> disjoint, (deliveryState, deliveryCount)-distinct cover clipped at SPSO. 
> Pipeline:
> pruneBatches drops/clips ranges below SPSO.
> mergeBatches loads survivors into a TreeSet, repeatedly finds the first 
> overlapping or contiguous-same-state pair.
> Improvement area :
>  iterative pairwise TreeSet merging. Simple contiguous same-state input is 
> roughly O(n log n), but overlap-heavy inputs can trigger repeated 
> split/reinsert rounds. The number of merge iterations is  data-dependent and 
> can grow non-linearly, approaching O(n^2 log n) in pathological overlap cases.
>  
> User-visible as non-linear WriteShareGroupState latency when many small ack 
> ranges accumulate per partition.
> We can make it Time: O(n log n) (sort) + O ( n ) merge 
> How:
> Replace the TreeSet + iterative findMergeCandidatePair loop with a 
> single-pass sweep-line over a sorted array:
> 1. Sort phase (O(n log n)): prune + sort into ArrayList
> 2. Sweep phase (O( n )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to