Github user d2r commented on the pull request:
https://github.com/apache/storm/pull/753#issuecomment-150319028
> Hi @d2r,
> Could you please review this PR? Your response is highly appreciated.
Thanks.
Firstly, nice job handling the stats.clj upmerge. That is not the nicest
code to work with.
My thoughts on this:
The Storm UI does not function like a history server; it only shows us what
is currently running. If a worker crashes, then those stats are lost to us.
With the way the current Storm UI works, if a worker is relaunched, then
the stats for its executors are presented as if the executors all have been up
for the same duration.
[The
change](https://github.com/wangli1426/storm/blob/1b0598485adb298d1e008b82ecda706d5996721f/storm-core/src/clj/backtype/storm/stats.clj#L286-L289)
weights the throughput rates based upon the time the executor has been
running. This could be good, but it has its own problems. If the previous
worker's throughput stats had declined sharply before the worker had died, then
weighting the current worker's throughput stats still would be inaccurate, but
in a different way. Also, the throughput stats would look really confusing
compared to the other stats, which are not weighted this way.
Solving the lost stats problem should be part of other work that perhaps
creates a History Server for Storm to keep historical information, including
some of these lost stats.
It seems to me the current UI is handling it one way (all executors have
been up for the same duration) and the patch handles it another (stats weighted
by executor's actual uptime). For now, until we get a true history server, I
think there is value in keeping the UI self-consistent. I do not think we want
to present aggregated throughput numbers that do not correspond to Emitted
numbers for a window.
The simple way to do this would be to divide Emitted by the minimum of
Topology Uptime & Window. If we did it this way, it would keep the UI
consistent, and it could be as simple as changing the JavaScript to do a simple
divide in the browser. This way we would not need to change Storm system code
or update the thrift serialization.
And as @harshac mentioned, we could add another JavaScript change to show a
simple topology-wide throughput in the same way.
What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---