[
https://issues.apache.org/jira/browse/KAFKA-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746084#comment-16746084
]
zhenyu jiang commented on KAFKA-6848:
-------------------------------------
This description is very similar to the problem I encountered.My situation is
as follows:
*application log like this:*
{code:java}
2019-01-18 03:30:29.873 INFO [dmp-notifier,,]
[org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-1,
groupId=notifier] Discovered group coordinator 10.211.6.56:9092 (id: 2147483645
rack: null)
2019-01-18 03:30:32.699 ERROR [dmp-notifier,,]
[org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-1,
groupId=notifier] Offset commit failed on partition dmp.notifier.notice-0 at
offset 294: This is not the correct coordinator.
2019-01-18 03:30:32.699 INFO [dmp-notifier,,]
[org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-1,
groupId=notifier] Marking the coordinator 10.211.6.56:9092 (id: 2147483645
rack: null) dead
2019-01-18 03:30:32.699 WARN [dmp-notifier,,]
[org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-1,
groupId=notifier] Asynchronous auto-commit of offsets
{dmp.notifier.notice-0=OffsetAndMetadata{offset=294, metadata=''},
dmp.notifier.notice-2=OffsetAndMetadata{offset=438, metadata=''},
dmp.notifier.notice-1=OffsetAndMetadata{offset=45, metadata=''},
dmp.notifier.notice-4=OffsetAndMetadata{offset=35, metadata=''},
dmp.notifier.notice-3=OffsetAndMetadata{offset=1242, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry committing
the latest consumed offsets.{code}
*10.211.6.56 state-change.log.2019-01-18-03 like this:*
{code:java}
[2019-01-18 03:30:32,697] TRACE Controller 2 epoch 17 started leader election
for partition [dmp.notifier.notice,2] (state.change.logger)
[2019-01-18 03:30:32,710] TRACE Controller 2 epoch 17 elected leader 3 for
Offline partition [dmp.notifier.notice,2] (state.change.logger)
[2019-01-18 03:30:32,748] TRACE Controller 2 epoch 17 changed partition
[dmp.notifier.notice,2] from OfflinePartition to OnlinePartition with leader 3
(state.change.logger){code}
*Another kafka node in the same cluster server.log.2019-01-18-03 like this:*
{code:java}
[2019-01-18 03:30:26,609] INFO Updated PartitionLeaderEpoch. New: {epoch:32,
offset:24117087}, Current: {epoch:31, offset24116582} for Partition:
__consumer_offsets-16. Cache now contains 29 entries.
(kafka.server.epoch.LeaderEpochFileCache)
[2019-01-18 03:30:42,140] WARN Client session timed out, have not heard from
server in 4090ms for sessionid 0x2684b1cd93e0003
(org.apache.zookeeper.ClientCnxn)
[2019-01-18 03:30:42,140] INFO Client session timed out, have not heard from
server in 4090ms for sessionid 0x2684b1cd93e0003, closing socket connection and
attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-01-18 03:30:42,240] INFO zookeeper state changed (Disconnected)
(org.I0Itec.zkclient.ZkClient)
[2019-01-18 03:30:42,450] INFO Opening socket connection to server
prod-dmp3.fengdai.org/10.211.6.57:2181. Will not attempt to authenticate using
SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-01-18 03:30:42,451] INFO Socket connection established to
prod-dmp3.fengdai.org/10.211.6.57:2181, initiating session
(org.apache.zookeeper.ClientCnxn)
[2019-01-18 03:30:42,452] INFO Session establishment complete on server
prod-dmp3.fengdai.org/10.211.6.57:2181, sessionid = 0x2684b1cd93e0003,
negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-01-18 03:30:42,452] INFO zookeeper state changed (SyncConnected)
(org.I0Itec.zkclient.ZkClient){code}
10.211.6.56 service logs have no exceptions at this time,but there are many
exceptions before the time (by other topic),please see the attachment.
[^kafka_service-log.log]
> Kafka consumer failed to get correct offset after commit
> --------------------------------------------------------
>
> Key: KAFKA-6848
> URL: https://issues.apache.org/jira/browse/KAFKA-6848
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.11.0.0
> Reporter: YY.Roy
> Priority: Major
> Attachments: kafka_service-log.log
>
>
> I use kafka consumer java api to poll messages from broker, and here is the
> code:
> Consumer consumer = new Consumer(props);
> consumer.assgin(topicPartitions);
> long nextOffset = consumer.position(topicPartition);
> consumer.poll();
> consumer.commitSync();
>
> The above code is called by a quartz scheduler every minute and the group.id
> is always the same. It ran properly during past several days until today
> around 8:20:35 am, the position api always returned the older offset
> committed two days ago, not the latest one which was committed around 8:20:33
> am. It seems the kafka offset of this group.id just went backward
>
> I polled the offsets message from the kafka internal topic __consumer_offsets
> and saw the lastes message was correct, which is like this:
> [eb89887c591b4d2a98c7,my-topic-eb89887c591b4d2a98c7,0]::[OffsetMetadata[447648316,NO_METADATA],CommitTime
> 1525220421173,ExpirationTime 1526430021173]
> The commitTime showed it was indeed the last successful commit.
> But then the position api returned a wrong offset, which is the first message
> of the corresponding partition of __consumer_offsets. It is like kafka broker
> regards this older committed offset is the correct offset of this group.id,
> but the correct one should have been last message in the __consumer_offsets.
> Then I checked the broker server log and found at that time there are some
> connection errors, which just the same time the position is called.
> 08:20:33,261 WARN Attempting to send response via channel for which there is
> no open connection, connection id 2 (kafka.network.Processor)
> There are some other consumer trying to call position at this time and the
> leader of those topics are this broker too. After that they call get a wrong
> offset which were older commits in __consumer_offsets.
>
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)