I like the idea of both the metrics and it might be great to include them. Prometheus can aggregate metrics downstream by component-id/source-task etc It is a nice tool.
cheers /karthik On Wed, Apr 18, 2018 at 8:32 PM, Fu Maosong <[email protected]> wrote: > One concern is that it will significantly increase the number of metrics, > potentially leading performance concerns. > > 2018-04-18 18:58 GMT-07:00 Thomas Cooper <[email protected]>: > > > Hi All, > > > > This started out a quick slack post, then a reasonably sized email and > now > > it has headings! > > > > *Introduction* > > > > I am working on a performance modeling system for Heron. Hopefully this > > system will be useful for checking proposed plans will meet performance > > targets and also for checking if currently running physical plans will > have > > back pressure issues with higher traffic rates. > > > > To do this I need to know what proportion of tuples are routed from each > > upstream instance to its downstream instances, which is a metric that > Heron > > does not provide by default. > > > > *Proposal* > > > > I have implemented a custom metric to do what I need in my test > topologies, > > it is a simple multi-count metric called "__receive-count" where the key > > now includes the "sourceTaskId" value (which you can get from the tuple > > instance) as well as the source component name and incoming stream name. > > > > This is basically the same as the default "__execute-count" metric but > the > > metric name format is > > "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>" > > instead of "__execute-count/<source-component>/<incoming-stream>" > > > > So I see two options: > > > > 1. Create a new "__receive-count" metric and leave the > "__execute-count" > > alone > > 2. Alter "__execute-count" to include the source task ID. > > > > *Questions* > > > > My first question is weather the metric name is parsed anywhere further > > down the line, such as aggregating component metrics in the metrics > > manager? So changing the name would break things? > > > > My second is if we do change "__execute-count" should we also add the > > source task ID to other bolt metrics like "__execute-latency" (it would > be > > nice to see how latency changes by source instance --- this is a > particular > > issue in two consecutive fields grouped components as instances will > > receive very different key distributions which could lead to very > different > > processing latency). > > > > *Implementation* > > > > To add this to the default metrics (or change "__execute-count") seems > like > > it would be reasonably straight forward (famous last words). We would > need > > to modify the `FullBoltMetric` class to include the new metrics (if > > required) and edit the `FullBoltMetric.executeTuple` method to accept the > > "sourceTaskId" (which is already available in the > > "BoltInstance.readTuplesAndExecute" method) as a 4th argument. > > > > Obviously, we will need to do the same with the Python implementation. > Will > > this also need to be changed in the Storm compatibility layer? > > > > *Conclusion* > > > > Having the information on where tuples are flowing is really important if > > we want to be able to do more intelligent routing and adaptive > auto-scaling > > in the future and hopefully this one small change/extra metric won't add > > any significant processing overhead. > > > > I look forward to hearing what you think. > > > > Cheers, > > > > Tom Cooper > > W: www.tomcooper.org.uk | Twitter: @tomncooper > > <https://twitter.com/tomncooper> > > > > > > -- > With my best Regards > ------------------ > Fu Maosong > Twitter Inc. > Mobile: +001-415-244-7520 >
