Tests were performed using:
- Kafka cluster (trunk) with 6 brokers (default configuration)
- 1 topic with 6 partitions (1 partition on each broker) filled with 100k 
messages

Running a consumer without a memory pool:
- 300MB heap (`-Xmx300M -Xms300M`)
- `fetch.max.bytes=32428800`
- `max.partition.fetch.bytes=15428800`
It OOMs within 15 seconds

Running a consumer with a memory pool (30MB):
- 300MB heap (`-Xmx300M -Xms300M`)
- `fetch.max.bytes=32428800`
- `max.partition.fetch.bytes=15428800`
- `buffer.memory=32428801`
It's running without any issues using less than 200MB of memory.
<img width="875" alt="screen shot 2018-09-20 at 15 37 08" 
src="https://user-images.githubusercontent.com/903615/45822411-b3856380-bceb-11e8-9952-f351f635ca53.png";>

Both Consumers have the exact same logic, a call to `poll()` in a loop.

In both cases, each brokers return around 15MB of data (150 messages). 
- Without the pool, each `poll()` returns messages from 3 or more partitions, 
quickly using up all memory.
- With the pool, the 1st 2 FetchResponses exhaust the pool, keeping the other 
responses queued in the network buffer. Calls to `poll()` only returns messages 
from 1 or 2 partitions and only once records have been returned, the other 
responses are read. So the pace of the Consumer is slowed down but it is able 
to use significantly less memory.

[ Full content available at: https://github.com/apache/kafka/pull/4934 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to