[
https://issues.apache.org/jira/browse/STORM-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231736#comment-15231736
]
ASF GitHub Bot commented on STORM-1698:
---------------------------------------
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/storm/pull/1322#issuecomment-207251473
@abhishekagarwal87
Yes, I also have two ideas to avoid or minimize those issues.
1) discard policy
2) filter metrics
About 1) _taskQueue will become bounded queue. And when it's full, we
discard some enqueued tasks via policy. Policy could be drop oldest, drop
latest, drop randomly, or even provide interface for plug-in.
About 2), actually there're many metrics provided by Storm by default.
Furthermore, some modules like storm-kafka provides their metrics, too. If
users set pattern what they want to subscribe or filter out that would be great.
I'm going to create an umbrella issue regarding improvement of metrics and
add STORM-1698 as subtask.
> Asynchronous MetricsConsumerBolt
> --------------------------------
>
> Key: STORM-1698
> URL: https://issues.apache.org/jira/browse/STORM-1698
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Affects Versions: 1.0.0, 2.0.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
>
> Currently MetricsConsumerBolt is delegating MetricsConsumer to handle data
> points via synchronous manner.
> When MetricsConsumer cannot keep up, it will trigger backpressure when (queue
> size + overflow buffer size) reaches high watermark, which incurs slowing
> down the topology in result.
> Slowing down Itself is not a problem because that’s what backpressure is for.
> The actual problem is that backpressure only throttles spout, not metrics. If
> MetricsConsumerBolt cannot keep up with incoming tuples, backpressure never
> ends and topology just hangs. If we turn off backpressure, we have unbounded
> queue and worker could throw OOME eventually.
> Making MetricsConsumerBolt asynchronous can resolve this issue. One downside
> of making it async is that it's hard to see that MetricsConsumerBolt is
> keeping up now. (capacity will be always around 0)
> I don't have an idea for now but I think it's still better than current.
> Before making consensus about huge change of metrics, I'd love to improve
> current metrics without breaking backward compatible manner. It could be
> applied to 1.x-branch, and even 0.10.x-branch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)