[ 
https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032091#comment-15032091
 ] 

Ben Stopford commented on KAFKA-2904:
-------------------------------------

This hasn't reoccured since the timeout was increased to 60s. 

> Consumer Fails to Reconnect after 30s post restarts
> ---------------------------------------------------
>
>                 Key: KAFKA-2904
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2904
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ben Stopford
>            Assignee: Ben Stopford
>         Attachments: 2015-11-27--001 (1).tar.gz
>
>
> This problem occurs in around 1 in 20 executions of the security rolling 
> upgrade test. 
> Test scenario is a rolling upgrade where each of the three servers are 
> restarted in turn whilst producer and consumers run. A ten second sleep 
> between start and stop of each node has been added to ensure there is time 
> for failover to occur (re KAFKA-2827). 
> Failure results in no consumed messages after the failure point. 
> Periodically the consumer does not reconnect for its 30s timeout. The 
> consumer’s log at this point is at the bottom of this jira.
> ISR's appear normal at the time of the failure.
> The producer is able to produce throughout this period. 
> *TIMELINE:*
> {quote}
> 20:39:23 - Test starts Consumer and Producer
> 20:39:27 - Consumer starts consuming produced messages
> 20:39:30 - Node 1 shutdown complete
> 20:39:45 - Node 1 restarts
> 20:39:59 - Node 2 shutdown complete
> 20:40:14 - Node 2 restarts 
> 20:40:27 - Consumer stops consuming
> 20:40:28 - Node 2 becomes controller
> 20:40:28 - Node 3 shutdown complete
> 20:40:34 - GroupCoordinator 2: Preparing to restabilize group 
> unique-test-group...
> 20:40:42 - Node 3 restarts
> *20:41:03 - Consumer times out*
> 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Assignment received from leader for group 
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Preparing to restabilize group 
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed 
> 20:41:53 - Producer shuts down
> {quote}
> Consumer log at time of failure:
> {quote}
> [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 
> (kafka.tools.ConsoleConsumer$)
> [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while 
> committing offsets for group unique-test-group-0.952644842527 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be 
> completed due to group rebalance 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while 
> committing offsets for group unique-test-group-0.952644842527 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] WARN Auto offset commit failed:  
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,329] INFO Attempt to join group 
> unique-test-group-0.952644842527 failed due to unknown member id, resetting 
> and retrying. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,347] INFO SyncGroup for group 
> unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining 
> the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO SyncGroup for group 
> unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, 
> will find new coordinator and rejoin 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:28,097] INFO Attempt to join group 
> unique-test-group-0.952644842527 failed due to unknown member id, resetting 
> and retrying. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Attempt to join group 
> unique-test-group-0.952644842527 failed due to obsolete coordinator 
> information, retrying. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:41:03,704] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
>       at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
>       at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
>       at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
>       at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
>       at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and 
> will exit. (org.apache.kafka.common.security.kerberos.Login)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to