Kay created KAFKA-10019: --------------------------- Summary: MirrorMaker 2 did not function properly after restart (message lost, messages arriving slowly) Key: KAFKA-10019 URL: https://issues.apache.org/jira/browse/KAFKA-10019 Project: Kafka Issue Type: Bug Components: mirrormaker Affects Versions: 2.4.1 Environment: Amazon Linux 2 MSK clusters: kafka.m5.large, 3 AZ, 3 brokers MM2 instances: c5.2xlarge Producer/Consumer instances: c5.2xlarge Reporter: Kay Attachments: 2a-consumer.log, 2a-producer.log
MM2 did not function properly after stopping a running MM2 process then starting it again. Consumer did not receive all messages (even messages being sent after MM2 restarted). The messages arriving to the consumer were no longer at the rate as specified in "--message" and "--timeout". To reproduce the issue # Environment: ## Region 1: one Kafka cluster, two MM2 instances, 1 producer instance ## Region 2: one Kafka cluster, two MM2 instances, 1 consumer instance # **Producer (in region 1) started sending 1000 messages. ## ./bin/kafka-producer-perf-test.sh --producer.config config/producer.properties --topic topic1 --record-size 4800000 --num-records 1000 --throughput 17 # Consumer (in region 2) started receiving messages. ## while true; do ./bin/kafka-consumer-perf-test.sh --threads 60 *--timeout 5000* --consumer.config config/consumer.properties --topic region1.topic1 *--messages 250* --group region2-consume-region1topic1 --broker-list $KAFKA_BROKERS; done > consumer.log & # Consumer received the first 500 messages (250, 250), as "--message" specified. # Killed the MM2 process on one of two instances in both regions. # Consumer started receiving the remaining messages at a much slower "rate" (160, 29, 19, 11, 9, 6, 5, 5, 0,.. 3, 0,... 2, 0,... 1). # Restarted the MM2 processes killed at (4). # Producer sent another 1000 messages. # Still, messages no longer arrived at the "--message" rate (250 * N), but e.g. 37, 30, 23, 13, 9, 0, 1, 3... # And consumer did not receive all new 1000 messages sent after MM2 restarted. Please see the producer and consumer log files attached. In the consumer log file, you can see that after the first 2 consecutive "250" messages arrived, the message arrived differently. *Issue Summary* # MM2 does not recover from restarting its process. # After killing a MM2 process in the MM2 EC2 instance, a Consumer no longer received the messages at the rate of "--message" and "--timeout". # Consumer did not receive all messages even those messages were published after the mm2 process restarted. # Consumer no longer received messages at the rate of "--message" and "-timeout" even after the mm2 process restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005)