I like the idea of both the metrics and it might be great to include them.

Prometheus can aggregate metrics downstream by component-id/source-task etc
It is a nice tool.

cheers
/karthik

On Wed, Apr 18, 2018 at 8:32 PM, Fu Maosong <[email protected]> wrote:

> One concern is that it will significantly increase the number of metrics,
> potentially leading performance concerns.
>
> 2018-04-18 18:58 GMT-07:00 Thomas Cooper <[email protected]>:
>
> > Hi All,
> >
> > This started out a quick slack post, then a reasonably sized email and
> now
> > it has headings!
> >
> > *Introduction*
> >
> > I am working on a performance modeling system for Heron. Hopefully this
> > system will be useful for checking proposed plans will meet performance
> > targets and also for checking if currently running physical plans will
> have
> > back pressure issues with higher traffic rates.
> >
> > To do this I need to know what proportion of tuples are routed from each
> > upstream instance to its downstream instances, which is a metric that
> Heron
> > does not provide by default.
> >
> > *Proposal*
> >
> > I have implemented a custom metric to do what I need in my test
> topologies,
> > it is a simple multi-count metric called "__receive-count" where the key
> > now includes the "sourceTaskId" value (which you can get from the tuple
> > instance) as well as the source component name and incoming stream name.
> >
> > This is basically the same as the default "__execute-count" metric but
> the
> > metric name format is
> > "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>"
> > instead of "__execute-count/<source-component>/<incoming-stream>"
> >
> > So I see two options:
> >
> >    1. Create a new "__receive-count" metric and leave the
> "__execute-count"
> >    alone
> >    2. Alter "__execute-count" to include the source task ID.
> >
> > *Questions*
> >
> > My first question is weather the metric name is parsed anywhere further
> > down the line, such as aggregating component metrics in the metrics
> > manager? So changing the name would break things?
> >
> > My second is if we do change "__execute-count" should we also add the
> > source task ID to other bolt metrics like "__execute-latency" (it would
> be
> > nice to see how latency changes by source instance --- this is a
> particular
> > issue in two consecutive fields grouped components as instances will
> > receive very different key distributions which could lead to very
> different
> > processing latency).
> >
> > *Implementation*
> >
> > To add this to the default metrics (or change "__execute-count") seems
> like
> > it would be reasonably straight forward (famous last words). We would
> need
> > to modify the `FullBoltMetric` class to include the new metrics (if
> > required) and edit the `FullBoltMetric.executeTuple` method to accept the
> > "sourceTaskId" (which is already available in the
> > "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
> >
> > Obviously, we will need to do the same with the Python implementation.
> Will
> > this also need to be changed in the Storm compatibility layer?
> >
> > *Conclusion*
> >
> > Having the information on where tuples are flowing is really important if
> > we want to be able to do more intelligent routing and adaptive
> auto-scaling
> > in the future and hopefully this one small change/extra metric won't add
> > any significant processing overhead.
> >
> > I look forward to hearing what you think.
> >
> > Cheers,
> >
> > Tom Cooper
> > W: www.tomcooper.org.uk  | Twitter: @tomncooper
> > <https://twitter.com/tomncooper>
> >
>
>
>
> --
> With my best Regards
> ------------------
> Fu Maosong
> Twitter Inc.
> Mobile: +001-415-244-7520
>

Reply via email to