Storm does not have back pressure by default. Also because storm supports
loops in a topology the message queues can grow unbounded. We have put in a
number of fixes in newer versions of storm, also for the messaging side of
things. But the simplest way to avoid this is to have acking enabled and have
max spout pending set to a reasonable number. This will typically be caused by
one of the executors in your worker not being able to keep up with the load
coming in. There is also the possibility that a single thread cannot keep up
with the incoming message load. In the former case you should be able to see
the capacity go very high on some of the executors. In the latter case you
will not see that, and may need to add more workers to your topology. - Bobby
On Thursday, December 22, 2016 10:01 PM, Erik Weathers
<[email protected]> wrote:
We're debugging a topology's infinite memory growth for a worker process
that is running a metrics consumer bolt, and we just noticed that the netty
Server.java's message_queue
<https://github.com/apache/storm/blob/v0.9.6/storm-core/src/jvm/backtype/storm/messaging/netty/Server.java#L97>
is growing forever (at least it goes up to ~5GB before it hits heap limits
and leads to heavy GCing). (We found this by using Eclipse's Memory
Analysis Tool on a heap dump obtained via jmap.)
We're running storm-0.9.6, and this is happening with a topology that is
processing 200K+ tuples per second, and producing a lot of metrics.
I'm a bit surprised that this queue would grow forever, I assumed there
would be some sort of limit. I'm pretty naive about how netty's message
receiving system tied into the Storm executors at this point though. I'm
kind of assuming the behavior could be a result of backpressure / slowness
from our downstream monitoring system, but there's no visibility provided
by Storm into what's happening with these messages in the netty queues
(that I have been able to ferret out at least!).
Thanks for any input you might be able to provide!
- Erik