Some discussion of performance testing prompted me to write this, not sure where to put it.
Performance testing messaging systems (with or without queues) requires a bit of care. There are two parameters: throughput and latency. They are *not independent* and *cannot realistically be measured separately*. The naive approach (at least my first naive approach) is to stuff messages into the system as fast as possible and take them out as fast as possible. Do this for N messages and divide the total time to receive them all by N to get throughput. Average the time each message takes to arrive. The first problem with this is that if messages go in faster than they come out, buffers and queues are growing throughout the test. You may not notice in an N message test but you will in production when brokers run out of memory/disk. Even if this is not the case, if you fill queues while nobody is receiving then drain them while nobody is sending, you get different performance characteristics than under continuous send/receive load. If you don't run your test for long enough your results can be dominated by fill/drain behavior rather than continuous load behavior. This naive approach gives you great throughput because all of the components are going at full throttle all the time. It is not realistic because the system will eventually fall over if queues are growing. It gives bad latency because as queues gets longer, messages spend longer sitting on them and latency rises with queue depth. To make this realistic you can: a. Configure your broker/intermediary to flow control senders so that it's internal queues and buffers stay within a fixed limit (not every intermediary offers this feature.) Measure fill/drain times up to that limit. Make sure you run your test much longer than the fill/drain time so your results are dominated by continuous flow-controlled behavior. b. Rate limit your test to a rate (which you have to find by experiment) where the queue/buffer sizes stay within limits by themselves, then proceed as per a. You may well find that latency rises as you increase throughput, because the system is buffering/queuing more data in an attempt to be more efficient, so messages sit longer in buffers & queues. That is why TCP_NODELAY famously improves latency under low throughput (because it tells TCP *not* to try to fill buffers but to get messages out as fast as possible) BUT check what happens when you drive up the rate... c. Rate limit your test to the rate you expect in production and measure whether there is long term growth of queues/buffers. If there is, you have a problem. There are systems where you expect low throughput and require low latency - in that case option c. and TCP_NODELAY may fit. Other systems require high throughput and low latency is "nice to have" in that case go with a. if your intermediary supports it and b. if not. TCP_NODELAY may be a bad idea. Producer flow control (a.) is preferable to rate-limiting (b.) because it protects against unexpected but sustained overload conditions. If you have a system that doesn't have producer flow control then you will need flow-control feedback built into the application itself to handle *sustained* overload. A queuing system will *probably* be able to handle minor load spikes even without producer flow control. Often such feedback to exists anyway, and can be harnessed. For example in a request-response system the responses themselves can be used as feedback and you can limit the number of unresponded-to requests by waiting for responses before sending more. The AMQP 1.0 protocol provides everything an intermediary needs to to support producer flow control without any client-side changes. The 0-10 protocol made it very difficult, although not impossible. qpidd supports it for both (thanks to heroic work by kgiusti.) Cheers, Alan. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
