Hi devs, We have added a feature regarding support old Storm workers in Storm 2.0.0 via STORM-2448 [1] which was OK to me before addressing metrics issue, but for now I think it worths to discuss.
STORM-2448 assumes we have backward compatible interaction between daemons (Nimbus/Supervisor/etc.) and worker in Storm 2.0.0. It is not only for interaction via thrift, but also for interaction via any ways including Zookeeper. STORM-2693[2] came in as nice improvement, which changes the mechanism of heartbeat (replace ZK with thrift RPC for interprocess heartbeat transfer) and it is not compatible with old Storm workers. (We are still be able to make it as backward compatible via letting Nimbus also support old style heartbeat - reading ZK periodically, but it clearly reduces the performance gain.) Now I can see a patch for STORM-2156[3], which stores metrics into RocksDB, but worker metrics are not addressed yet. I guess it will depend on Metrics V2 (STORM-2153)[4] and regardless of dependent, if STORM-2156 would want to change the approach of publishing metric from workers (via thrift RPC), it will be also backward incompatible (same reason as STORM-2693). We should break backward compatibility eventually to enjoy full benefits on this (and others if we have similar improvements), and I'm not sure why it can't be at Storm 2.0.0 (major release, nearly 2 years after 1.0.0). Some users might be upset with backward incompatibility, but I don't think they would not be upset we postpone the breaking changes and finally bring them to Storm 3.0.0. I would like to hear everyone's opinions regarding how to handle this situation. We might have some workarounds which makes us bring both features but with reducing effects. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/STORM-2448 2. https://issues.apache.org/jira/browse/STORM-2693 3. https://issues.apache.org/jira/browse/STORM-2156 4. https://issues.apache.org/jira/browse/STORM-2153 ps. I imagine that how our consensus goes for this situation: if we could bring much improvements but only breaking backward compatible way. One possible change would be dropping Acker mechanism and adopting distributed snapshot: I have been thinking this as worth to do, and JStorm already made a change to bring performance gain and also get advantage while windowing.
