We're debugging a topology's infinite memory growth for a worker process
that is running a metrics consumer bolt, and we just noticed that the netty
Server.java's message_queue
<https://github.com/apache/storm/blob/v0.9.6/storm-core/src/jvm/backtype/storm/messaging/netty/Server.java#L97>
is growing forever (at least it goes up to ~5GB before it hits heap limits
and leads to heavy GCing).  (We found this by using Eclipse's Memory
Analysis Tool on a heap dump obtained via jmap.)

We're running storm-0.9.6, and this is happening with a topology that is
processing 200K+ tuples per second, and producing a lot of metrics.

I'm a bit surprised that this queue would grow forever, I assumed there
would be some sort of limit.  I'm pretty naive about how netty's message
receiving system tied into the Storm executors at this point though.  I'm
kind of assuming the behavior could be a result of backpressure / slowness
from our downstream monitoring system, but there's no visibility provided
by Storm into what's happening with these messages in the netty queues
(that I have been able to ferret out at least!).

Thanks for any input you might be able to provide!

- Erik

Reply via email to