[ https://issues.apache.org/jira/browse/KAFKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839233#comment-16839233 ]
Braedon Vickers commented on KAFKA-4600: ---------------------------------------- Thanks [~guozhang], the section {{Rebalance Callback Error Handling}} looks good to me. It makes sense that if the user wants to retry after capturing the exception they must make sure their listener implementation is OK being run again. You mention: {quote}Suppose the former succeeds but the latter failed with an error and user captured it in consumer.poll and retry, we just let the consumer to proceed as assign with \{1, 2, 3\} since the 3 is added successfully but 1 is revoked unsuccessfully{quote} Does this mean that when the user retries `consumer.poll()` it is possible that the consumer consumes more messages from partition 1 before it retries the revocation? If possible, it'd be nicer/safer to guarantee that no more messages are consumed from partition 1 by this consumer before it successfully revoked. That way the user only needs to worry about whether their listener implementation itself can be re-run, not whether they are able to safely consume messages from a partially revoked partition. > Consumer proceeds on when ConsumerRebalanceListener fails > --------------------------------------------------------- > > Key: KAFKA-4600 > URL: https://issues.apache.org/jira/browse/KAFKA-4600 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.1.1 > Reporter: Braedon Vickers > Priority: Major > > One of the use cases for a ConsumerRebalanceListener is to load state > necessary for processing a partition when it is assigned. However, when > ConsumerRebalanceListener.onPartitionsAssigned() fails for some reason (i.e. > the state isn't loaded), the error is logged and the consumer proceeds on as > if nothing happened, happily consuming messages from the new partition. When > the state is relied upon for correct processing, this can be very bad, e.g. > data loss can occur. > It would be better if the error was propagated up so it could be dealt with > normally. At the very least the assignment should fail so the consumer > doesn't see any messages from the new partitions, and the rebalance can be > reattempted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)