Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
First of all, I missed Storm on containers and around security. Thanks for
the pointer. Much helped even in my side.
Looks like there are some things to sort out.
1.
Please take a look at https://issues.apache.org/jira/browse/STORM-2693 and
comments. I've already proposed @danny0405 to try out Pacemaker, and he said it
didn't help in some cases. Looks like operational experience regarding
Pacemaker between you and @danny0405 are somewhat different.
https://issues.apache.org/jira/browse/STORM-2693?focusedCommentId=16146530&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16146530
Maybe @danny0405 is using old version of Storm, or there might be some
internal patches not contributed to Apache side. One thing to sort out, but if
comment from @danny0405 is valid, that means Pacemaker is not working as
expected, and that would be a critical issue for Pacemaker. I don't have huge
scale cluster so I haven't manage cluster with Pacemaker.
2.
We know we utilize ZK in some bad way (write goes heavier whenever
workers/topologies are being added), and we introduce Pacemaker as a nice
mitigation.
If we think Pacemaker is now really stable and have all of essential
functionalities, we may need to consider making it as default, or publicizing
Pacemaker more via guiding when to consider using Pacemaker.
Without making it as a default or any guide to consider Pacemaker while
sizing, users would normally struggle ZK issue first (got bad experience
already), and try out Pacemaker as an alternative.
3.
The bigger concern for me between metrics and assignment is metrics. I
don't know how much assignment via ZK affects the performance (if it hurts much
it should be also considered), but it is really clear that worker metrics in ZK
has been problematic for us.
I believe we will (and should) eventually drop current heartbeat structure
which includes metrics, and the sooner the better.
What I have been not clear is how and when. From that point I have been
expecting that Metrics V2 will take up the issue, and unfortunately, based on
the current patch of Metrics V2, we would probably still use Metrics V1 for
built-in metrics in Storm 2.0.0 unless we have separate patch for Metrics V2.
We should have a plan to migrate built-in metrics from Metrics V1 to
Metrics V2, because there would be some more TODOs to make it done, and it
can't be done partially (especially 2 and 3).
1. Implement worker metrics reporter which reports to Nimbus.
2. Change Nimbus to get metrics from metrics store instead of heartbeat,
which makes UI leverage the metrics.
3. Migrate built-in metrics from Metrics V1 to Metrics V2: after the patch
built-in metrics will not be presented to metric consumer (Metrics V1).
The above work would be backward incompatible, but we have no plan for
Storm 2.0.0, and I'd rather not thinking about 3.0.0 even we don't have Storm
2.0.0. It is ideal to be done before Storm 2.0.0, and if it's not possible, I
wouldn't mind introducing it in 2.x with disregarding backward compatibility.
I think it is related to the change opened partially and now internally
testing in Oath. Could you share a plan for this so that community can
determine whether community should wait on news or there're some tasks Storm
community should work on?
4.
We may want to sort out which should be secure and which are not. Does
metric need to be secured? If then same security issue will arise when
implementing worker metrics reporter.
---