One concern is that it will significantly increase the number of metrics, potentially leading performance concerns.
2018-04-18 18:58 GMT-07:00 Thomas Cooper <[email protected]>: > Hi All, > > This started out a quick slack post, then a reasonably sized email and now > it has headings! > > *Introduction* > > I am working on a performance modeling system for Heron. Hopefully this > system will be useful for checking proposed plans will meet performance > targets and also for checking if currently running physical plans will have > back pressure issues with higher traffic rates. > > To do this I need to know what proportion of tuples are routed from each > upstream instance to its downstream instances, which is a metric that Heron > does not provide by default. > > *Proposal* > > I have implemented a custom metric to do what I need in my test topologies, > it is a simple multi-count metric called "__receive-count" where the key > now includes the "sourceTaskId" value (which you can get from the tuple > instance) as well as the source component name and incoming stream name. > > This is basically the same as the default "__execute-count" metric but the > metric name format is > "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>" > instead of "__execute-count/<source-component>/<incoming-stream>" > > So I see two options: > > 1. Create a new "__receive-count" metric and leave the "__execute-count" > alone > 2. Alter "__execute-count" to include the source task ID. > > *Questions* > > My first question is weather the metric name is parsed anywhere further > down the line, such as aggregating component metrics in the metrics > manager? So changing the name would break things? > > My second is if we do change "__execute-count" should we also add the > source task ID to other bolt metrics like "__execute-latency" (it would be > nice to see how latency changes by source instance --- this is a particular > issue in two consecutive fields grouped components as instances will > receive very different key distributions which could lead to very different > processing latency). > > *Implementation* > > To add this to the default metrics (or change "__execute-count") seems like > it would be reasonably straight forward (famous last words). We would need > to modify the `FullBoltMetric` class to include the new metrics (if > required) and edit the `FullBoltMetric.executeTuple` method to accept the > "sourceTaskId" (which is already available in the > "BoltInstance.readTuplesAndExecute" method) as a 4th argument. > > Obviously, we will need to do the same with the Python implementation. Will > this also need to be changed in the Storm compatibility layer? > > *Conclusion* > > Having the information on where tuples are flowing is really important if > we want to be able to do more intelligent routing and adaptive auto-scaling > in the future and hopefully this one small change/extra metric won't add > any significant processing overhead. > > I look forward to hearing what you think. > > Cheers, > > Tom Cooper > W: www.tomcooper.org.uk | Twitter: @tomncooper > <https://twitter.com/tomncooper> > -- With my best Regards ------------------ Fu Maosong Twitter Inc. Mobile: +001-415-244-7520
