Benchmarking Pulsar using OpenMessaging Benchmark

Tyler Landle Tue, 19 Feb 2019 16:43:54 -0800

Hey Guys,

I have been trying to benchmark pulsar and get some E2E latency data from
it, and try to stress it in regards to End to End Latency. Namely, we have
been trying to stress it in a way that we would see End to End Latency
increase in some sort of way between say, 5ms and 50ms.


We have encountered some strange behavior doing this, and was wondering if
you guys had any insights on how to generate the information we are trying
to gather.

Weird behavior we are seeing:

1. The clients actually run out of resource space before the broker does
for a non persistent workload.

Using Openmessaging benchmark, we run workloads with number of topics
varying from 3-243 topics, and up to 600,000 msg/s, non persistent topics,
one broker. We are finding with under 6 local workers working as producers
and consumers, that the clients usually run out of resources first, skewing
our End to End latency statistics. Is this...in line with what you see? Is
it normal for clients to run out of resources before the broker starts
seeing latency gain?

2. If you raise rate on single topic, it will have higher end to end
latency. But if you add another topic at a lower rate, that end to end
latency does not increase until resource utilization is constrained.

We ran the following experiments(1 broker, non persistent topics)

2 local workers run a workload on 50 topics with 200,000 msg/s aggregate
message rate. 2 Other local workers run a workload on 1 topic at 10,000
msg/s, with 1 broker on non persistent topic. The single topic at 10,000
msg/s sees latency around 2ms, while the "background" workload of 50 topics
with 200,000 msg/s sees latency in the 20ms range. Do you have any idea why
we would see this behavior?



Overall, we are looking to stress the brokers without stressing the clients
first to see how number of topics and message rate affects pulsar as a
whole. Rather than seeing any linear or explainable increase in latency, we
have been seeing a pretty flat latency curve(2-5ms) followed by a huge
spike at some workload level(somewhere around 100ms). Is there some way
that you know of to see some kind of normalized latency increase that is
not due to resource utilization(our first guess as to why we see such high
spikes in latency).

Would this be better for the Users group? I wasn't really sure which one to
send to, but the devs may have a better idea for some of the behavior that
we are seeing.

Thanks,
Tyler Landle

Benchmarking Pulsar using OpenMessaging Benchmark

Reply via email to