Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/2433 @revans2 First of all, I missed Storm on containers and around security. Thanks for the pointer. Much helped even in my side. Looks like there are some things to sort out. 1. Please take a look at https://issues.apache.org/jira/browse/STORM-2693 and comments. I've already proposed @danny0405 to try out Pacemaker, and he said it didn't help in some cases. Looks like operational experience regarding Pacemaker between you and @danny0405 are somewhat different. https://issues.apache.org/jira/browse/STORM-2693?focusedCommentId=16146530&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16146530 Maybe @danny0405 is using old version of Storm, or there might be some internal patches not contributed to Apache side. One thing to sort out, but if comment from @danny0405 is valid, that means Pacemaker is not working as expected, and that would be a critical issue for Pacemaker. I don't have huge scale cluster so I haven't manage cluster with Pacemaker. 2. We know we utilize ZK in some bad way (write goes heavier whenever workers/topologies are being added), and we introduce Pacemaker as a nice mitigation. If we think Pacemaker is now really stable and have all of essential functionalities, we may need to consider making it as default, or publicizing Pacemaker more via guiding when to consider using Pacemaker. Without making it as a default or any guide to consider Pacemaker while sizing, users would normally struggle ZK issue first (got bad experience already), and try out Pacemaker as an alternative. 3. The bigger concern for me between metrics and assignment is metrics. I don't know how much assignment via ZK affects the performance (if it hurts much it should be also considered), but it is really clear that worker metrics in ZK has been problematic for us. I believe we will (and should) eventually drop current heartbeat structure which includes metrics, and the sooner the better. What I have been not clear is how and when. From that point I have been expecting that Metrics V2 will take up the issue, and unfortunately, based on the current patch of Metrics V2, we would probably still use Metrics V1 for built-in metrics in Storm 2.0.0 unless we have separate patch for Metrics V2. We should have a plan to migrate built-in metrics from Metrics V1 to Metrics V2, because there would be some more TODOs to make it done, and it can't be done partially (especially 2 and 3). 1. Implement worker metrics reporter which reports to Nimbus. 2. Change Nimbus to get metrics from metrics store instead of heartbeat, which makes UI leverage the metrics. 3. Migrate built-in metrics from Metrics V1 to Metrics V2: after the patch built-in metrics will not be presented to metric consumer (Metrics V1). The above work would be backward incompatible, but we have no plan for Storm 2.0.0, and I'd rather not thinking about 3.0.0 even we don't have Storm 2.0.0. It is ideal to be done before Storm 2.0.0, and if it's not possible, I wouldn't mind introducing it in 2.x with disregarding backward compatibility. I think it is related to the change opened partially and now internally testing in Oath. Could you share a plan for this so that community can determine whether community should wait on news or there're some tasks Storm community should work on? 4. We may want to sort out which should be secure and which are not. Does metric need to be secured? If then same security issue will arise when implementing worker metrics reporter.
---