Thoughts on performance testing of messaging systems.

aconway Tue, 18 Aug 2015 09:51:50 -0700

Some discussion of performance testing prompted me to write this, not
sure where to put it.


Performance testing messaging systems (with or without queues) requires
a bit of care. There are two parameters: throughput and latency. They
are *not independent* and *cannot realistically be measured
separately*.

The naive approach (at least my first naive approach) is to stuff
messages into the system as fast as possible and take them out as fast
as possible. Do this for N messages and divide the total time to
receive them all by N to get throughput. Average the time each message
takes to arrive.

The first problem with this is that if messages go in faster than they
come out, buffers and queues are growing throughout the test. You may
not notice in an N message test but you will in production when brokers
run out of memory/disk.

Even if this is not the case, if you fill queues while nobody is
receiving then drain them while nobody is sending, you get different
performance characteristics than under continuous send/receive load. If
you don't run your test for long enough your results can be dominated
by fill/drain behavior rather than continuous load behavior.

This naive approach gives you great throughput because all of the
components are going at full throttle all the time. It is not realistic
because the system will eventually fall over if queues are growing. It
gives bad latency because as queues gets longer, messages spend longer
sitting on them and latency rises with queue depth.

To make this realistic you can:

a. Configure your broker/intermediary to flow control senders so that
it's internal queues and buffers stay within a fixed limit (not every
intermediary offers this feature.) Measure fill/drain times up to that
limit. Make sure you run your test much longer than the fill/drain time
so your results are dominated by continuous flow-controlled behavior.

b. Rate limit your test to a rate (which you have to find by
experiment) where the queue/buffer sizes stay within limits by
themselves, then proceed as per a. You may well find that latency rises
as you increase throughput, because the system is buffering/queuing
more data in an attempt to be more efficient, so messages sit longer in
buffers & queues. That is why TCP_NODELAY famously improves latency
under low throughput (because it tells TCP *not* to try to fill buffers
but to get messages out as fast as possible) BUT check what happens
when you drive up the rate...

c. Rate limit your test to the rate you expect in production and
measure whether there is long term growth of queues/buffers. If there
is, you have a problem.

There are systems where you expect low throughput and require low
latency - in that case option c. and TCP_NODELAY may fit. Other systems
require high throughput and low latency is "nice to have" in that case
go with a. if your intermediary supports it and b. if not. TCP_NODELAY
may be a bad idea.

Producer flow control (a.) is preferable to rate-limiting (b.) because
it protects against unexpected but sustained overload conditions. If
you have a system that doesn't have producer flow control then you will
need flow-control feedback built into the application itself to handle
*sustained* overload. A queuing system will *probably* be able to
handle minor load spikes even without producer flow control.

Often such feedback to exists anyway, and can be harnessed. For example
in a request-response system the responses themselves can be used as
feedback and you can limit the number of unresponded-to requests by
waiting for responses before sending more.

The AMQP 1.0 protocol provides everything an intermediary needs to to
support producer flow control without any client-side changes. The 0-10
protocol made it very difficult, although not impossible. qpidd
supports it for both (thanks to heroic work by kgiusti.)

Cheers,
Alan.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Thoughts on performance testing of messaging systems.

Reply via email to