May you already know about this, please also note that count of metrics
tuples are linear with overall task count. Higher parallelism puts more
pressure to the metrics bolt.

I guess Taylor and Alessandro have been working on metrics v2. Unless we
finish metrics v2, we can just reduce the load with metrics whitelist /
blacklist, and asynchronous metrics consumer bolt on upcoming Storm 1.1.0.
(Before that you might would like to give a try to migrate to 1.x, say,
1.0.2 for now.)

- Jungtaek Lim (HeartSaVioR)

2017년 1월 5일 (목) 오전 12:42, Bobby Evans <[email protected]>님이 작성:

> Yes you are right that will not help.  The best you can do now is to
> increase the number of MetricsConsumer instances that you have.  You can do
> this when you register the metrics consumer.
> conf.registerMetricsConsumer(NoOpMetricsConsumer.class, 3);
> The default is 1, but we have see with very large topologies, or ones that
> output a lot of metrics they can sometimes get bogged down.
> You could also try profiling that worker to see what is taking so long.
> If a NoOp is also showing the same signs it would be interesting to see
> why.  It could be the number of events coming in, or it could be the size
> of the metrics being sent making deserialization costly. - Bobby
>
>     On Tuesday, January 3, 2017 2:05 PM, Erik Weathers
> <[email protected]> wrote:
>
>
>  Thanks for the response Bobby!
>
> I think I might have failed to sufficiently emphasize & explain something
> in my earlier description of the issue:  this is happening *only* in a
> worker process that is hosting a bolt that implements the *IMetricsConsumer
> *interface.  The other 24 worker processes are working just fine, their
> netty queues do not grow forever.  The same number and type of executors
> are on every worker process, except that one worker that is hosting the
> metrics consumer bolt.
>
> So the netty queue is growing unbounded because of an influx of metrics.
> The acking and max spout pending configs wouldn't seem to directly
> influence the filling of the netty queue with custom metrics.
>
> Notably, this "choking" behavior happens even with a "NoOpMetricsConsumer"
> bolt which is the same as storm's LoggingMetricsConsumer but with the
> handleDataPoints() doing *nothing*.  Interesting, right?
>
> - Erik
>
> On Tue, Jan 3, 2017 at 7:06 AM, Bobby Evans <[email protected]>
> wrote:
>
> > Storm does not have back pressure by default.  Also because storm
> supports
> > loops in a topology the message queues can grow unbounded.  We have put
> in
> > a number of fixes in newer versions of storm, also for the messaging side
> > of things.  But the simplest way to avoid this is to have acking enabled
> > and have max spout pending set to a reasonable number.  This will
> typically
> > be caused by one of the executors in your worker not being able to keep
> up
> > with the load coming in.  There is also the possibility that a single
> > thread cannot keep up with the incoming  message load.  In the former
> case
> > you should be able to see the capacity go very high on some of the
> > executors.  In the latter case you will not see that, and may need to add
> > more workers to your topology.  - Bobby
> >
> >    On Thursday, December 22, 2016 10:01 PM, Erik Weathers
> > <[email protected]> wrote:
> >
> >
> >  We're debugging a topology's infinite memory growth for a worker process
> > that is running a metrics consumer bolt, and we just noticed that the
> netty
> > Server.java's message_queue
> > <https://github.com/apache/storm/blob/v0.9.6/storm-core/
> > src/jvm/backtype/storm/messaging/netty/Server.java#L97>
> > is growing forever (at least it goes up to ~5GB before it hits heap
> limits
> > and leads to heavy GCing).  (We found this by using Eclipse's Memory
> > Analysis Tool on a heap dump obtained via jmap.)
> >
> > We're running storm-0.9.6, and this is happening with a topology that is
> > processing 200K+ tuples per second, and producing a lot of metrics.
> >
> > I'm a bit surprised that this queue would grow forever, I assumed there
> > would be some sort of limit.  I'm pretty naive about how netty's message
> > receiving system tied into the Storm executors at this point though.  I'm
> > kind of assuming the behavior could be a result of backpressure /
> slowness
> > from our downstream monitoring system, but there's no visibility provided
> > by Storm into what's happening with these messages in the netty queues
> > (that I have been able to ferret out at least!).
> >
> > Thanks for any input you might be able to provide!
> >
> > - Erik
> >
> >
> >
> >
>
>
>

Reply via email to