[ 
https://issues.apache.org/jira/browse/KAFKA-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Stopford updated KAFKA-2908:
--------------------------------
    Comment: was deleted

(was: Was bug in the way the test defines min.insync.replicas as found by 
[~rsivaram])

> Another, possibly different, Gap in Consumption after Restart
> -------------------------------------------------------------
>
>                 Key: KAFKA-2908
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2908
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Ben Stopford
>            Assignee: Jason Gustafson
>         Attachments: 2015-11-28--001.tar.gz
>
>
> *Context:*
> Instance of the rolling upgrade test. 10s sleeps have been put around node 
> restarts to ensure stability when subsequent nodes go down. Consumer timeout 
> has been set to 60s. Proudcer has been throttled to 100 messages per second. 
> Failure is rare: It occurred once in 60 executions (6 executions per run #276 
> -> #285 of the system_test_branch_builder)
> *Reported Failure:*
> At least one acked message did not appear in the consumed messages. 
> acked_minus_consumed: 16385, 16388, 16391, 16394, 16397, 16400, 16403, 16406, 
> 16409, 16412, 16415, 16418, 16421, 16424, 16427, 16430, 16433, 16436, 16439, 
> 16442, ...plus 1669 more
> *Immediate Observations:*
> * The list of messages not consumed are all in partition 1.
> * Production and Consumption continues throughout the test (there is no 
> complete write or read failure as we have seen elsewhere)
> * The messages ARE present in the data files:
> e.g. 
> {quote}
> Examining missing value 16385. This was written to p1,offset:5453
> => there is an entry in all data files for partition 1 for this (presumably 
> meaning it was replicated correctly)
> kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
> worker10/kafka-data-logs/test_topic-1/00000000000000000000.log | grep 
> 'offset: 5453'
> worker10,9,8:
> offset: 5453 position: 165346 isvalid: true payloadsize: 5 magic: 0 
> compresscodec: NoCompressionCodec crc: 1502953075 payload: 16385
> => ID 16385 is definitely present in the Kafka logs suggesting a problem 
> consumer-side
> {quote}
> These entries definitely do not appear in the consumer stdout file either. 
> *Timeline:*
> {quote}
> 15:27:02,232 - Producer sends first message
> ..servers 1, 2 are restarted (clean shutdown)
> 15:29:42,718 - Server 3 shutdown complete
> 15:29:42,712 - (Controller fails over): Broker 2 starting become controller 
> state transition (kafka.controller.KafkaController)
> 15:29:42,743 - New leasder is 2 (LeaderChangeListner)
> 15:29:43,239 - WARN Broker 2 ignoring LeaderAndIsr request from controller 2 
> with correlation id 0 epoch 7 for partition (test_topic,1) since its 
> associated leader epoch 8 is old. Current leader epoch is 8 
> (state.change.logger)
> 15:29:45,642 - Producer starts writing messages that are never consumed
> 15:30:10,804 - Last message sent by producer
> 15:31:10,983 - Consumer times out after 60s wait without messages
> {quote}
> Logs for this run are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to