[ 
https://issues.apache.org/jira/browse/KAFKA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-5397.
----------------------------------
    Resolution: Fixed

> streams are not recovering from LockException during rebalancing
> ----------------------------------------------------------------
>
>                 Key: KAFKA-5397
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5397
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>    Affects Versions: 0.10.2.1, 0.11.0.0
>         Environment: one node setup, confluent kafka broker v3.2.0, 
> kafka-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams
>            Reporter: Jozef Koval
>             Fix For: 1.0.0
>
>
> Probably continuation of #KAFKA-5167. Portions of log:
> {code}
> 2017-06-07 01:17:52,435 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask              
>    - task [2_0] Failed offset commits 
> {browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597,
>  metadata=''}, 
> browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164,
>  metadata=''}, 
> browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506,
>  metadata=''}, 
> browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795,
>  metadata=''}, 
> browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888,
>  metadata=''}} due to Commit cannot be completed since the group has already 
> rebalanced and assigned the partitions to another member. This means that the 
> time between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time message processing. You can address this either by increasing 
> the session timeout or by reducing the maximum size of batches returned in 
> poll() with max.poll.records.
> 2017-06-07 01:17:52,436 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask              
>    - task [7_0] Failed offset commits 
> {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
>  metadata=''}} due to Commit cannot be completed since the group has already 
> rebalanced and assigned the partitions to another member. This means that the 
> time between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time message processing. You can address this either by increasing 
> the session timeout or by reducing the maximum size of batches returned in 
> poll() with max.poll.records.
> 2017-06-07 01:17:52,488 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Failed to commit StreamTask 7_0 state: 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
>         at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>         at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>         at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268)
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259)
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> 2017-06-07 01:17:52,747 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask              
>    - task [7_0] Failed offset commits 
> {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
>  metadata=''}} due to Commit cannot be completed since the group has already 
> rebalanced and assigned the partitions to another member. This means that the 
> time between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time message processing. You can address this either by increasing 
> the session timeout or by reducing the maximum size of batches returned in 
> poll() with max.pol
> l.records.
> 2017-06-07 01:17:52,776 ERROR 
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Failed to suspend stream task 7_0 due to: 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,781 WARN  
> [73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask               
>   - task [6_3] Failed offset commits 
> {browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738,
>  metadata=''}} due to Commit cannot be completed since the group has already 
> rebalanced and assigned the partitions to another member. This means that the 
> time between subsequent calls to poll() was longer than the configured 
> max.poll.interval.ms, which typically implies that the poll loop is spending 
> too much time message processing. You can address this either by increasing 
> the session timeout or by reducing the maximum size of batches returned in 
> poll() with max.poll.records.
> 2017-06-07 01:17:52,781 ERROR 
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Failed to suspend stream task 6_3 due to: 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,782 ERROR 
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator     
>    - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group browser-aggregation failed on partition revocation
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend 
> stream tasks
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353)
>         at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051)
>         at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> //
> 2017-06-07 01:18:15,739 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Could not create task 6_2. Will retry. 
> org.apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the 
> state directory for task 6_2
> 2017-06-07 01:18:16,741 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Could not create task 7_2. Will retry. 
> org.apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the 
> state directory for task 7_2
> 2017-06-07 01:18:17,745 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Could not create task 7_3. Will retry. 
> org.apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the 
> state directory for task 7_3
> 2017-06-07 01:18:17,795 WARN  
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread            
>    - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] 
> Still retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 
> 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to