We're debugging a topology's infinite memory growth for a worker process that is running a metrics consumer bolt, and we just noticed that the netty Server.java's message_queue <https://github.com/apache/storm/blob/v0.9.6/storm-core/src/jvm/backtype/storm/messaging/netty/Server.java#L97> is growing forever (at least it goes up to ~5GB before it hits heap limits and leads to heavy GCing). (We found this by using Eclipse's Memory Analysis Tool on a heap dump obtained via jmap.)
We're running storm-0.9.6, and this is happening with a topology that is processing 200K+ tuples per second, and producing a lot of metrics. I'm a bit surprised that this queue would grow forever, I assumed there would be some sort of limit. I'm pretty naive about how netty's message receiving system tied into the Storm executors at this point though. I'm kind of assuming the behavior could be a result of backpressure / slowness from our downstream monitoring system, but there's no visibility provided by Storm into what's happening with these messages in the netty queues (that I have been able to ferret out at least!). Thanks for any input you might be able to provide! - Erik
