[
https://issues.apache.org/jira/browse/KAFKA-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027266#comment-16027266
]
Grant Neale commented on KAFKA-5337:
------------------------------------
PR raised: https://github.com/apache/kafka/pull/3158
> Partition assignment strategy that distributes lag evenly across consumers in
> each group
> ----------------------------------------------------------------------------------------
>
> Key: KAFKA-5337
> URL: https://issues.apache.org/jira/browse/KAFKA-5337
> Project: Kafka
> Issue Type: New Feature
> Components: consumer
> Affects Versions: 0.10.2.1
> Reporter: Grant Neale
> Priority: Minor
>
> Existing partition assignment strategies (RangeAssignor and
> RoundRobinAssignor) do not account for the current consumer group lag on each
> partition. This can result in sub-optimal assignments when the distribution
> of lags for a given topic and consumer group is skewed.
> The LagBasedAssignor operates on a per-topic basis, and attempts to assign
> partitions such that lag is distributed as evenly across a consumer group.
> h4. Algorithm:
> For each topic, first obtain the lag on all partitions. Lag on a given
> partition is the difference between the end offset and the last offset
> committed by the consumer group. If no offsets have been committed for a
> partition we determine the lag based on the code auto.offset.reset property.
> If auto.offset.reset=latest, we assume a lag of 0. If
> auto.offset.reset=earliest (or any other value) we assume lag equal to the
> total number of message currently available in that partition.
> Next, create a map storing the current total lag of all partitions assigned
> to each member of the consumer group. Partitions are assigned in decreasing
> order of lag, with each partition assigned to the consumer with least total
> number of assigned partitions, breaking ties by assigning to the consumer
> with the least total currently assigned lag.
> Assigning partitions evenly across consumers (by partition count) ensures
> that the assignment is reasonably balanced (by partition count) when all
> partitions have a current lag of 0 or if the distribution of lags is heavily
> skewed. It also gives the consumer group the best possible chance of
> remaining balanced if the assignment is retained for a long period (assuming
> throughput is consistent across members of the consumer group).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)