[ https://issues.apache.org/jira/browse/KAFKA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang resolved KAFKA-5397. ---------------------------------- Resolution: Fixed > streams are not recovering from LockException during rebalancing > ---------------------------------------------------------------- > > Key: KAFKA-5397 > URL: https://issues.apache.org/jira/browse/KAFKA-5397 > Project: Kafka > Issue Type: Sub-task > Components: streams > Affects Versions: 0.10.2.1, 0.11.0.0 > Environment: one node setup, confluent kafka broker v3.2.0, > kafka-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams > Reporter: Jozef Koval > Fix For: 1.0.0 > > > Probably continuation of #KAFKA-5167. Portions of log: > {code} > 2017-06-07 01:17:52,435 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask > - task [2_0] Failed offset commits > {browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597, > metadata=''}, > browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164, > metadata=''}, > browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506, > metadata=''}, > browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795, > metadata=''}, > browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888, > metadata=''}} due to Commit cannot be completed since the group has already > rebalanced and assigned the partitions to another member. This means that the > time between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time message processing. You can address this either by increasing > the session timeout or by reducing the maximum size of batches returned in > poll() with max.poll.records. > 2017-06-07 01:17:52,436 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask > - task [7_0] Failed offset commits > {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085, > metadata=''}} due to Commit cannot be completed since the group has already > rebalanced and assigned the partitions to another member. This means that the > time between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time message processing. You can address this either by increasing > the session timeout or by reducing the maximum size of batches returned in > poll() with max.poll.records. > 2017-06-07 01:17:52,488 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Failed to commit StreamTask 7_0 state: > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146) > at > org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307) > at > org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49) > at > org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268) > at > org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187) > at > org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259) > at > org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253) > at > org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73) > at > org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795) > at > org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442) > at > org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787) > at > org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525) > 2017-06-07 01:17:52,747 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask > - task [7_0] Failed offset commits > {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085, > metadata=''}} due to Commit cannot be completed since the group has already > rebalanced and assigned the partitions to another member. This means that the > time between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time message processing. You can address this either by increasing > the session timeout or by reducing the maximum size of batches returned in > poll() with max.pol > l.records. > 2017-06-07 01:17:52,776 ERROR > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Failed to suspend stream task 7_0 due to: > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,781 WARN > [73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask > - task [6_3] Failed offset commits > {browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738, > metadata=''}} due to Commit cannot be completed since the group has already > rebalanced and assigned the partitions to another member. This means that the > time between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time message processing. You can address this either by increasing > the session timeout or by reducing the maximum size of batches returned in > poll() with max.poll.records. > 2017-06-07 01:17:52,781 ERROR > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Failed to suspend stream task 6_3 due to: > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,782 ERROR > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator > - User provided listener > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener > for group browser-aggregation failed on partition revocation > org.apache.kafka.streams.errors.StreamsException: stream-thread > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend > stream tasks > at > org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73) > at > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016) > at > org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525) > // > 2017-06-07 01:18:15,739 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Could not create task 6_2. Will retry. > org.apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the > state directory for task 6_2 > 2017-06-07 01:18:16,741 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Could not create task 7_2. Will retry. > org.apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the > state directory for task 7_2 > 2017-06-07 01:18:17,745 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Could not create task 7_3. Will retry. > org.apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the > state directory for task 7_3 > 2017-06-07 01:18:17,795 WARN > [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread > - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] > Still retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, > 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)