Kay created KAFKA-10019:
---------------------------

             Summary: MirrorMaker 2 did not function properly after restart 
(message lost, messages arriving slowly)
                 Key: KAFKA-10019
                 URL: https://issues.apache.org/jira/browse/KAFKA-10019
             Project: Kafka
          Issue Type: Bug
          Components: mirrormaker
    Affects Versions: 2.4.1
         Environment: Amazon Linux 2
MSK clusters: kafka.m5.large, 3 AZ, 3 brokers
MM2 instances: c5.2xlarge
Producer/Consumer instances: c5.2xlarge
            Reporter: Kay
         Attachments: 2a-consumer.log, 2a-producer.log

MM2 did not function properly after stopping a running MM2 process then 
starting it again. Consumer did not receive all messages (even messages being 
sent after MM2 restarted).  The messages arriving to the consumer were no 
longer at the rate as specified in "--message" and "--timeout".

To reproduce the issue
 # Environment:
 ## Region 1: one Kafka cluster, two MM2 instances, 1 producer instance
 ## Region 2: one Kafka cluster, two MM2 instances, 1 consumer instance
 # **Producer (in region 1) started sending 1000 messages.

 ## ./bin/kafka-producer-perf-test.sh --producer.config 
config/producer.properties --topic topic1 --record-size 4800000 --num-records 
1000 --throughput 17
 # Consumer (in region 2) started receiving messages.
 ## while true; do ./bin/kafka-consumer-perf-test.sh --threads 60 *--timeout 
5000* --consumer.config config/consumer.properties --topic region1.topic1 
*--messages 250* --group region2-consume-region1topic1 --broker-list 
$KAFKA_BROKERS; done > consumer.log &
 # Consumer received the first 500 messages (250, 250), as "--message" 
specified.
 # Killed the MM2 process on one of two instances in both regions.

 # Consumer started receiving the remaining messages at a much slower "rate" 
(160, 29, 19, 11, 9, 6, 5, 5, 0,.. 3, 0,... 2, 0,... 1).

 # Restarted the MM2 processes killed at (4).

 # Producer sent another 1000 messages.

 # Still, messages no longer arrived at the "--message" rate (250 * N), but 
e.g. 37, 30, 23, 13, 9, 0, 1, 3...

 # And consumer did not receive all new 1000 messages sent after MM2 restarted. 

Please see the producer and consumer log files attached.

In the consumer log file, you can see that after the first 2 consecutive "250" 
messages arrived, the message arrived differently.

*Issue Summary*
 # MM2 does not recover from restarting its process.
 # After killing a MM2 process in the MM2 EC2 instance, a Consumer no longer 
received the messages at the rate of "--message" and "--timeout".

 # Consumer did not receive all messages even those messages were published 
after the mm2 process restarted.

 # Consumer no longer received messages at the rate of "--message" and 
"-timeout" even after the mm2 process restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to