Sean Glover created KAFKA-8814:
----------------------------------
Summary: Consumer benchmark test for paused partitions
Key: KAFKA-8814
URL: https://issues.apache.org/jira/browse/KAFKA-8814
Project: Kafka
Issue Type: New Feature
Components: consumer, system tests, tools
Reporter: Sean Glover
Assignee: Sean Glover
A new performance benchmark and corresponding {{ConsumerPerformance}} tools
addition to support the paused partition performance improvement implemented in
KAFKA-7548. Before the fix, when the user would poll for completed fetched
records for partitions that were paused, the consumer would throw away the data
because it no longer fetchable. If the partition is resumed then the data
would have to be fetched over again. The fix will cache completed fetched
records for paused partitions indefinitely so they can be potentially be
returned once the partition is resumed.
In the Jira issue KAFKA-7548 there are several informal test results shown
based on a number of different paused partition scenarios, but it was suggested
that a test in the benchmarks testsuite would be ideal to demonstrate the
performance improvement. In order to the implement this benchmark we must
implement a new feature in {{ConsumerPerformance}} used by the benchmark
testsuite and the {{kafka-consumer-perf-test.sh}} bin script that will pause
partitions. I added the following parameter:
{code:scala}
val pausedPartitionsOpt = parser.accepts("paused-partitions-percent", "The
percentage [0-1] of subscribed " +
"partitions to pause each poll.")
.withOptionalArg()
.describedAs("percent")
.withValuesConvertedBy(regex("^0(\\.\\d+)?|1\\.0$")) // matches [0-1]
with decimals
.ofType(classOf[Float])
.defaultsTo(0F)
{code}
This allows the user to specify a percentage (represented a floating point
value from {{0..1}}) of partitions to pause each poll interval. When the value
is greater than {{0}} then we will take the next _n_ partitions to pause. I
ran the test on `trunk` and rebased onto the `2.3.0` tag for the following test
summaries of
{{kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput}}.
The test will rotate through pausing {{80%}} of assigned partitions (5/6)
each poll interval. I ran this on my laptop.
{{trunk}} ({{aa4ba8eee8e6f52a9d80a98fb2530b5bcc1b9a11}})
{code}
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id: 2019-08-18--010
run time: 2 minutes 29.104 seconds
tests run: 1
passed: 1
failed: 0
ignored: 0
================================================================================
test_id:
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status: PASS
run time: 2 minutes 29.048 seconds
{"records_per_sec": 450207.0953, "mb_per_sec": 42.9351}
--------------------------------------------------------------------------------
{code}
{{2.3.0}}
{code}
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id: 2019-08-18--011
run time: 2 minutes 41.228 seconds
tests run: 1
passed: 1
failed: 0
ignored: 0
================================================================================
test_id:
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status: PASS
run time: 2 minutes 41.168 seconds
{"records_per_sec": 246574.6024, "mb_per_sec": 23.5152}
--------------------------------------------------------------------------------
{code}
The increase in record and data throughput is significant. Based on other
consumer fetch metrics there are also improvements to fetch rate. Depending on
how often partitions are paused and resumed it's possible to save a lot of data
transfer between the consumer and broker as well.
Please see the pull request for the associated changes. I was unsure if I
needed to create a KIP because while technically I added a new public api to
the {{ConsumerPerformance}} tool, it was only to enable this benchmark to run.
If you feel that a KIP is necessary I'll create one.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)