Tests were performed using: - Kafka cluster (trunk) with 6 brokers (default configuration) - 1 topic with 6 partitions (1 partition on each broker) filled with 100k messages
Running a consumer without a memory pool: - 300MB heap (`-Xmx300M -Xms300M`) - `fetch.max.bytes=32428800` - `max.partition.fetch.bytes=15428800` It OOMs within 15 seconds Running a consumer with a memory pool (30MB): - 300MB heap (`-Xmx300M -Xms300M`) - `fetch.max.bytes=32428800` - `max.partition.fetch.bytes=15428800` - `buffer.memory=32428801` It's running without any issues using less than 200MB of memory. <img width="875" alt="screen shot 2018-09-20 at 15 37 08" src="https://user-images.githubusercontent.com/903615/45822411-b3856380-bceb-11e8-9952-f351f635ca53.png"> Both Consumers have the exact same logic, a call to `poll()` in a loop. In both cases, each brokers return around 15MB of data (150 messages). - Without the pool, each `poll()` returns messages from 3 or more partitions, quickly using up all memory. - With the pool, the 1st 2 FetchResponses exhaust the pool, keeping the other responses queued in the network buffer. Calls to `poll()` only returns messages from 1 or 2 partitions and only once records have been returned, the other responses are read. So the pace of the Consumer is slowed down but it is able to use significantly less memory. [ Full content available at: https://github.com/apache/kafka/pull/4934 ] This message was relayed via gitbox.apache.org for [email protected]
