[ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao updated KAFKA-372: -------------------------- Attachment: kafka-372_v1.patch There were several issues that caused the problem. 1. Log.nextAppendOffset() calls flush each time. Since this method is called for every produce request, we force a disk flush for every produce request independent of the flush interval in the broker config. This makes producers very slow. 2. The default value for MaxFetchWaitMs in consumer config is 3 secs, which is too long. 3. The script runs console consumer in background and only waits for 20 secs, which is too short. What we should do is to run console consumer in foreground and wait until it finishes (since it has consumer timeout). Attach patch v1 that fixes items 1 and 2. The test now passes. However, we should address item 3 in the script too. > Consumer doesn't receive all data if there are multiple segment files > --------------------------------------------------------------------- > > Key: KAFKA-372 > URL: https://issues.apache.org/jira/browse/KAFKA-372 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Reporter: John Fung > Attachments: kafka-372_v1.patch, multi_seg_files_data_loss_debug.patch > > > This issue happens inconsistently but could be reproduced by following the > steps below (repeat step 4 a few times to reproduce it): > 1. Check out 0.8 branch (currently reproducible with rev. 1352634) > 2. Apply kafka-306-v4.patch > 3. Please note that the log.file.size is set to 10000000 in > system_test/broker_failure/config/server_*.properties (small enough to > trigger multi segment files) > 4. Under the directory <kafka home>/system_test/broker_failure, execute > command: > $ bin/run-test.sh 20 0 > 5. After the test is completed, the result will probably look like the > following: > ======================================================== > no. of messages published : 14000 > producer unique msg rec'd : 14000 > source consumer msg rec'd : 7271 > source consumer unique msg rec'd : 7271 > mirror consumer msg rec'd : 6960 > mirror consumer unique msg rec'd : 6960 > total source/mirror duplicate msg : 0 > source/mirror uniq msg count diff : 311 > ======================================================== > 6. By checking the kafka log files, the sum of the sizes of the source > cluster segments files are equal to those in the target cluster. > [/tmp] $ find kafka* -name *.kafka -ls > 18620155 9860 -rw-r--r-- 1 jfung eng 10096535 Jun 21 11:09 > kafka-source3-logs/test01-0/00000000000000000000.kafka > 18620161 9772 -rw-r--r-- 1 jfung eng 10004418 Jun 21 11:11 > kafka-source3-logs/test01-0/00000000000020105286.kafka > 18620160 9776 -rw-r--r-- 1 jfung eng 10008751 Jun 21 11:10 > kafka-source3-logs/test01-0/00000000000010096535.kafka > 18620162 4708 -rw-r--r-- 1 jfung eng 4819067 Jun 21 11:11 > kafka-source3-logs/test01-0/00000000000030109704.kafka > 19406431 9920 -rw-r--r-- 1 jfung eng 10157685 Jun 21 11:10 > kafka-target2-logs/test01-0/00000000000010335039.kafka > 19406429 10096 -rw-r--r-- 1 jfung eng 10335039 Jun 21 11:09 > kafka-target2-logs/test01-0/00000000000000000000.kafka > 19406432 10300 -rw-r--r-- 1 jfung eng 10544850 Jun 21 11:11 > kafka-target2-logs/test01-0/00000000000020492724.kafka > 19406433 3800 -rw-r--r-- 1 jfung eng 3891197 Jun 21 11:12 > kafka-target2-logs/test01-0/00000000000031037574.kafka > 7. If the log.file.size in target cluster is configured to a very large value > such that there is only 1 data file, the result would look like this: > ======================================================== > no. of messages published : 14000 > producer unique msg rec'd : 14000 > source consumer msg rec'd : 7302 > source consumer unique msg rec'd : 7302 > mirror consumer msg rec'd : 13750 > mirror consumer unique msg rec'd : 13750 > total source/mirror duplicate msg : 0 > source/mirror uniq msg count diff : -6448 > ======================================================== > 8. The log files are like these: > [/tmp] $ find kafka* -name *.kafka -ls > 18620160 9840 -rw-r--r-- 1 jfung eng 10075058 Jun 21 11:24 > kafka-source2-logs/test01-0/00000000000010083679.kafka > 18620155 9848 -rw-r--r-- 1 jfung eng 10083679 Jun 21 11:23 > kafka-source2-logs/test01-0/00000000000000000000.kafka > 18620162 4484 -rw-r--r-- 1 jfung eng 4589474 Jun 21 11:26 > kafka-source2-logs/test01-0/00000000000030269045.kafka > 18620161 9876 -rw-r--r-- 1 jfung eng 10110308 Jun 21 11:25 > kafka-source2-logs/test01-0/00000000000020158737.kafka > 19406429 34048 -rw-r--r-- 1 jfung eng 34858519 Jun 21 11:26 > kafka-target3-logs/test01-0/00000000000000000000.kafka -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira