Alex Dunayevsky created KAFKA-6743:
--------------------------------------

             Summary: ConsumerPerformance fails to consume all messages on 
topics with large number of partitions
                 Key: KAFKA-6743
                 URL: https://issues.apache.org/jira/browse/KAFKA-6743
             Project: Kafka
          Issue Type: Bug
          Components: core, tools
    Affects Versions: 0.11.0.2
            Reporter: Alex Dunayevsky


ConsumerPerformance fails to consume all messages on topics with large number 
of partitions due to a relatively short default polling loop timeout (1000 ms) 
that is not reachable and modifiable by the end user.

Demo: Create a topic of 10 000 partitions, send a 50 000 000 of 100 byte 
records using kafka-producer-perf-test and consume them using 
kafka-consumer-perf-test (ConsumerPerformance). You will likely notice that the 
number of records returned by the kafka-consumer-perf-test is many times less 
than expected 50 000 000. This happens due to specific ConsumerPerformance 
implementation. As the result, in some rough cases it may take a long enough 
time to process/iterate through the records polled in batches, thus, the time 
may exceed the default hardcoded polling loop timeout and this is probably not 
what we want from this utility.

We have two options: 
1) Increasing polling loop timeout in ConsumerPerformance implementation. It 
defaults to 1000 ms and is hardcoded, thus cannot be changed but we could 
export it as an OPTIONAL kafka-consumer-perf-test parameter to enable it on a 
script level configuration and available to the end user.
2) Decreasing max.poll.records on a Consumer config level. This is not a fine 
option though since we do not want to touch the default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to