Hi devs,

We have added a feature regarding support old Storm workers in Storm 2.0.0
via STORM-2448 [1] which was OK to me before addressing metrics issue, but
for now I think it worths to discuss.

STORM-2448 assumes we have backward compatible interaction between daemons
(Nimbus/Supervisor/etc.) and worker in Storm 2.0.0. It is not only for
interaction via thrift, but also for interaction via any ways including
Zookeeper.

STORM-2693[2] came in as nice improvement, which changes the mechanism of
heartbeat (replace ZK with thrift RPC for interprocess heartbeat transfer)
and it is not compatible with old Storm workers. (We are still be able to
make it as backward compatible via letting Nimbus also support old style
heartbeat - reading ZK periodically, but it clearly reduces the performance
gain.)

Now I can see a patch for STORM-2156[3], which stores metrics into RocksDB,
but worker metrics are not addressed yet. I guess it will depend on Metrics
V2 (STORM-2153)[4] and regardless of dependent, if STORM-2156 would want to
change the approach of publishing metric from workers (via thrift RPC), it
will be also backward incompatible (same reason as STORM-2693).

We should break backward compatibility eventually to enjoy full benefits on
this (and others if we have similar improvements), and I'm not sure why it
can't be at Storm 2.0.0 (major release, nearly 2 years after 1.0.0). Some
users might be upset with backward incompatibility, but I don't think they
would not be upset we postpone the breaking changes and finally bring them
to Storm 3.0.0.

I would like to hear everyone's opinions regarding how to handle this
situation. We might have some workarounds which makes us bring both
features but with reducing effects.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/STORM-2448
2. https://issues.apache.org/jira/browse/STORM-2693
3. https://issues.apache.org/jira/browse/STORM-2156
4. https://issues.apache.org/jira/browse/STORM-2153

ps. I imagine that how our consensus goes for this situation: if we could
bring much improvements but only breaking backward compatible way. One
possible change would be dropping Acker mechanism and adopting distributed
snapshot: I have been thinking this as worth to do, and JStorm already made
a change to bring performance gain and also get advantage while windowing.

Reply via email to