Re: Adding source task id to default metrics

Fu Maosong Wed, 18 Apr 2018 20:32:53 -0700

One concern is that it will significantly increase the number of metrics,
potentially leading performance concerns.


2018-04-18 18:58 GMT-07:00 Thomas Cooper <[email protected]>:

> Hi All,
>
> This started out a quick slack post, then a reasonably sized email and now
> it has headings!
>
> *Introduction*
>
> I am working on a performance modeling system for Heron. Hopefully this
> system will be useful for checking proposed plans will meet performance
> targets and also for checking if currently running physical plans will have
> back pressure issues with higher traffic rates.
>
> To do this I need to know what proportion of tuples are routed from each
> upstream instance to its downstream instances, which is a metric that Heron
> does not provide by default.
>
> *Proposal*
>
> I have implemented a custom metric to do what I need in my test topologies,
> it is a simple multi-count metric called "__receive-count" where the key
> now includes the "sourceTaskId" value (which you can get from the tuple
> instance) as well as the source component name and incoming stream name.
>
> This is basically the same as the default "__execute-count" metric but the
> metric name format is
> "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>"
> instead of "__execute-count/<source-component>/<incoming-stream>"
>
> So I see two options:
>
>    1. Create a new "__receive-count" metric and leave the "__execute-count"
>    alone
>    2. Alter "__execute-count" to include the source task ID.
>
> *Questions*
>
> My first question is weather the metric name is parsed anywhere further
> down the line, such as aggregating component metrics in the metrics
> manager? So changing the name would break things?
>
> My second is if we do change "__execute-count" should we also add the
> source task ID to other bolt metrics like "__execute-latency" (it would be
> nice to see how latency changes by source instance --- this is a particular
> issue in two consecutive fields grouped components as instances will
> receive very different key distributions which could lead to very different
> processing latency).
>
> *Implementation*
>
> To add this to the default metrics (or change "__execute-count") seems like
> it would be reasonably straight forward (famous last words). We would need
> to modify the `FullBoltMetric` class to include the new metrics (if
> required) and edit the `FullBoltMetric.executeTuple` method to accept the
> "sourceTaskId" (which is already available in the
> "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
>
> Obviously, we will need to do the same with the Python implementation. Will
> this also need to be changed in the Storm compatibility layer?
>
> *Conclusion*
>
> Having the information on where tuples are flowing is really important if
> we want to be able to do more intelligent routing and adaptive auto-scaling
> in the future and hopefully this one small change/extra metric won't add
> any significant processing overhead.
>
> I look forward to hearing what you think.
>
> Cheers,
>
> Tom Cooper
> W: www.tomcooper.org.uk  | Twitter: @tomncooper
> <https://twitter.com/tomncooper>
>



-- 
With my best Regards
------------------
Fu Maosong
Twitter Inc.
Mobile: +001-415-244-7520

Re: Adding source task id to default metrics

Reply via email to