[ 
https://issues.apache.org/jira/browse/KAFKA-20691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shekhar Prasad Rajak updated KAFKA-20691:
-----------------------------------------
    Summary: PersisterStateBatchCombiner.combineStateBatches() is O(n^2) 
worst-case; replace TreeSet-based iterative merge with single-pass sweep-line  
(was: PersisterStateBatchCombiner.combineStateBatches() is O(n²) worst-case; 
replace TreeSet-based iterative merge with single-pass sweep-line)

> PersisterStateBatchCombiner.combineStateBatches() is O(n^2) worst-case; 
> replace TreeSet-based iterative merge with single-pass sweep-line
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20691
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20691
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.2.1
>            Reporter: Shekhar Prasad Rajak
>            Assignee: Shekhar Prasad Rajak
>            Priority: Major
>              Labels: queue, queues-for-kafka
>             Fix For: 4.3.1
>
>
> combineStateBatches() merges batchesSoFar + newBatches into a sorted, 
> disjoint, (deliveryState, deliveryCount)-distinct cover clipped at SPSO. 
> Pipeline:
> pruneBatches drops/clips ranges below SPSO.
> mergeBatches loads survivors into a TreeSet, repeatedly finds the first 
> overlapping or contiguous-same-state pair.
> Improvement area : 
> Worst-case O(n ^ 2). Each merge step rescans the TreeSet from the start 
> (O(n)) and does O(log n) remove/add. 10k contiguous same-state batches → 10k 
> rescans of a shrinking TreeSet. User-visible as non-linear 
> WriteShareGroupState latency when many small ack ranges accumulate per 
> partition.
> We can make it Time: O(n log n) (sort) + O(n) merge 
> How: 
> Replace the TreeSet + iterative findMergeCandidatePair loop with a 
> single-pass sweep-line over a sorted array:
> 1. Sort phase (O(n log n)): prune + sort into ArrayList
> 2. Sweep phase (O(n)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to