Shekhar Prasad Rajak created KAFKA-20691:
--------------------------------------------

             Summary: PersisterStateBatchCombiner.combineStateBatches() is 
O(n²) worst-case; replace TreeSet-based iterative merge with single-pass 
sweep-line
                 Key: KAFKA-20691
                 URL: https://issues.apache.org/jira/browse/KAFKA-20691
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 4.2.1
            Reporter: Shekhar Prasad Rajak
            Assignee: Shekhar Prasad Rajak
             Fix For: 4.3.1


combineStateBatches() merges batchesSoFar + newBatches into a sorted, disjoint, 
(deliveryState, deliveryCount)-distinct cover clipped at SPSO. Pipeline:

pruneBatches drops/clips ranges below SPSO.
mergeBatches loads survivors into a TreeSet, repeatedly finds the first 
overlapping or contiguous-same-state pair.

Improvement area : 

Worst-case O(n ^ 2). Each merge step rescans the TreeSet from the start (O(n)) 
and does O(log n) remove/add. 10k contiguous same-state batches → 10k rescans 
of a shrinking TreeSet. User-visible as non-linear WriteShareGroupState latency 
when many small ack ranges accumulate per partition.

We can make it Time: O(n log n) (sort) + O(n) merge 
How: 

Replace the TreeSet + iterative findMergeCandidatePair loop with a single-pass 
sweep-line over a sorted array:
1. Sort phase (O(n log n)): prune + sort into ArrayList
2. Sweep phase (O(n)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to