Lior Chaga created KAFKA-5430:
---------------------------------

             Summary: new consumers getting data for revoked partitions
                 Key: KAFKA-5430
                 URL: https://issues.apache.org/jira/browse/KAFKA-5430
             Project: Kafka
          Issue Type: Bug
          Components: clients, consumer
    Affects Versions: 0.10.2.0
            Reporter: Lior Chaga
         Attachments: kafka_trace.log.gz

Due to bad configuration applied to network components, we experienced issues 
with communication between kafka brokers (causing under replication) as well as 
producers/consumers not being able to work against kafka.



The symptoms on the consumer were many errors of the following form:
{code}
2017-06-04 04:27:35,200 ERROR [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] TaboolaKafkaConsumer [] - Failed 
committing to kafka topicPartitions 
[requestlegacy-2,requestlegacy-0,requestlegacy-1] 
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit 
failed with a retriable exception. You should retry committing offsets.
Caused by: org.apache.kafka.common.errors.DisconnectException
{code}


So far so good. However, upon network recovery, there were several rebalance 
operations, which eventually resulted in only one consumer (#14) being assigned 
with all topic partitions (at this case we're talking about a consumer groups 
for which all consumers are running in same process):

{code}
2017-06-04 04:27:02,168 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-8, requestlegacy-9] for group 
session_parser_02

2017-06-04 04:27:04,208 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-10, requestlegacy-11] for group 
session_parser_02

2017-06-04 04:27:18,167 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-3, requestlegacy-4, 
requestlegacy-5] for group session_parser_02

2017-06-04 04:27:20,232 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-2, requestlegacy-0, 
requestlegacy-1] for group session_parser_02

2017-06-04 04:27:20,236 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-9, requestlegacy-10, requestlegacy-11] 
for group session_parser_02

2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
for group session_parser_02

2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-6, requestlegacy-7, requestlegacy-8] 
for group session_parser_02

2017-06-04 04:27:20,332 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
for group session_parser_02

2017-06-04 04:28:52,368 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-6, requestlegacy-7] for group 
session_parser_02

2017-06-04 04:29:15,201 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-2, requestlegacy-0, 
requestlegacy-1] for group session_parser_02

2017-06-04 04:30:22,379 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-6, requestlegacy-7, 
requestlegacy-8] for group session_parser_02

2017-06-04 04:30:24,431 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-9, requestlegacy-10, 
requestlegacy-11] for group session_parser_02

2017-06-04 04:30:38,229 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-3, requestlegacy-4, 
requestlegacy-5] for group session_parser_02

2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-6, requestlegacy-7] for group 
session_parser_02

2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-10, requestlegacy-11] for group 
session_parser_02

2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
for group session_parser_02

2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
for group session_parser_02

2017-06-04 04:30:39,157 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-8, requestlegacy-9] for group 
session_parser_02

2017-06-04 04:49:51,118 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-6, requestlegacy-7] for group 
session_parser_02

2017-06-04 04:50:40,847 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-3, requestlegacy-4, 
requestlegacy-5] for group session_parser_02

2017-06-04 04:50:58,898 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-10, requestlegacy-11] for group 
session_parser_02

2017-06-04 04:50:59,010 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-2, requestlegacy-0, 
requestlegacy-1] for group session_parser_02


2017-06-04 04:52:39,732 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-8, requestlegacy-9] for group 
session_parser_02

2017-06-04 04:55:57,633 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-8, requestlegacy-9] for group 
session_parser_02

2017-06-04 04:57:37,662 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-8, requestlegacy-9] for group 
session_parser_02

2017-06-04 05:00:57,601 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-2, requestlegacy-3, requestlegacy-4, 
requestlegacy-5, requestlegacy-6, requestlegacy-7, requestlegacy-8, 
requestlegacy-9, requestlegacy-0, requestlegacy-1, requestlegacy-10, 
requestlegacy-11] for group session_parser_02

2017-06-04 05:02:37,605 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Revoking 
previously assigned partitions [requestlegacy-2, requestlegacy-3, 
requestlegacy-4, requestlegacy-5, requestlegacy-6, requestlegacy-7, 
requestlegacy-8, requestlegacy-9, requestlegacy-0, requestlegacy-1, 
requestlegacy-10, requestlegacy-11] for group session_parser_02

2017-06-04 05:02:37,666 INFO  [Kafka Topics Cosumer 
requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
newly assigned partitions [requestlegacy-2, requestlegacy-3, requestlegacy-4, 
requestlegacy-5, requestlegacy-6, requestlegacy-7, requestlegacy-8, 
requestlegacy-9, requestlegacy-0, requestlegacy-1, requestlegacy-10, 
requestlegacy-11] for group session_parser_02

{code}


The interesting part is that after the last rebalance, other consumers were 
still polling messages for hours, and failed upon commit (notice the revoke 
message on consumer-12, and the commit failure below on same consumer, 2 
minutes later):

{code}
2017-06-04 04:52:41,396 ERROR [Kafka Topics Cosumer 
requestlegacy.consumer-12_session_parser_02] TaboolaKafkaConsumer [] - Failed 
committing to kafka topicPartitions 
[requestlegacy-3,requestlegacy-4,requestlegacy-5] 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing the session timeout or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
{code}



To complete the picture from kafka side, kafka-consumer-group reports all 
partitions assigned to consumer-14:
{code}
requestlegacy                  0          9092336         9092337         1     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  1          9094923         9094923         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  2          9091382         9091382         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  3          9097052         9097052         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  4          9097542         9097542         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  5          9098915         9098915         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  6          9098308         9098309         1     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  7          9096984         9096984         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  8          9100548         9100548         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  9          9097615         9097615         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  10         9095553         9095553         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
requestlegacy                  11         9089520         9089520         0     
     consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28           
      consumer-14
{code}

We took a trace log (attached) of org.apache.kafka for our consuming service 
when we discovered the issue, prior to restarting the service (which resulted 
in full recovery). 
Also, this happened for multiple consumer groups, this is just one example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to