[
https://issues.apache.org/jira/browse/STORM-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231874#comment-15231874
]
ASF GitHub Bot commented on STORM-1698:
---------------------------------------
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/storm/pull/1322#issuecomment-207319131
You got me. :)
I was also finding aggregation metrics on worker side, but after struggling
I gave it.
Why I gave is that IMetric.getValueAndReset() is not multi-threads
friendly. Currently metrics are updated at receive queue handler thread of
executors, and that thread also handles metrics tick tuple so there's no race
condition.
Maybe most way of in process aggregation should resolve this issue.
> Asynchronous MetricsConsumerBolt
> --------------------------------
>
> Key: STORM-1698
> URL: https://issues.apache.org/jira/browse/STORM-1698
> Project: Apache Storm
> Issue Type: Sub-task
> Components: storm-core
> Affects Versions: 1.0.0, 2.0.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
>
> Currently MetricsConsumerBolt is delegating MetricsConsumer to handle data
> points via synchronous manner.
> When MetricsConsumer cannot keep up, it will trigger backpressure when (queue
> size + overflow buffer size) reaches high watermark, which incurs slowing
> down the topology in result.
> Slowing down Itself is not a problem because that’s what backpressure is for.
> The actual problem is that backpressure only throttles spout, not metrics. If
> MetricsConsumerBolt cannot keep up with incoming tuples, backpressure never
> ends and topology just hangs. If we turn off backpressure, we have unbounded
> queue and worker could throw OOME eventually.
> Making MetricsConsumerBolt asynchronous can resolve this issue. One downside
> of making it async is that it's hard to see that MetricsConsumerBolt is
> keeping up now. (capacity will be always around 0)
> I don't have an idea for now but I think it's still better than current.
> Before making consensus about huge change of metrics, I'd love to improve
> current metrics without breaking backward compatible manner. It could be
> applied to 1.x-branch, and even 0.10.x-branch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)