Github user itaifrenkel commented on the pull request:
https://github.com/apache/incubator-storm/pull/177#issuecomment-50210273
Unit Tests: We can follow the way storm.rb is handled in the repository (3
copies of the same file) and translate the ruby unit tests into nodejs. I can
see why you don't want a third window being broken. But I request to take the
multilang cleanup into another JIRA, which leads me to ...
Packaging: Deploying and upgrading ruby/python/nodejs code is really
different than Java code. Java code in storm is JARed-with-dependencies.
Packing nodejs code with its entire npm dependency tree and automating it with
maven is possible, but you loose the no-need-to-do-the-compile-thingy nature of
the language. So even if we do call gem/pip/npm install from within the maven
build process, pack everything in a JAR and ship it with storm, I am not sure I
would use it that way. Instead the JAR would just be pure java code pointing to
the folder where the ruby/python/nodejs code is. And that is also the folder
where storm.rb/py/js would be copied to. So there is no reusable component you
can ship with storm, other than the official storm.js file that was tested to
work with this version of Storm.
Maintenance: The amount of maintenance of the multilang project is approx
numberOfFeatures times the numberOfLanguages times the numberOfSerialization
protocols.
IMO the numberOfLanguages is a strong selling point for Storm (choose the
best language that fits your business). Both ruby/python/node have a big crowd
of open-source developers this project can rally.
IMO numberOfFeautres that are not data related is not a strong selling
point. For example I would not use the storm logging or monitoring through the
multilang protocol. If there is a problem with the multilang/storm/whatever I
would like a nodejs library to log it to a file, or contact directly the
monitoring server, especially as Storm does the
fail-fast-the-worker-with-all-child-processes each time a bolt/spout has an
uncaught exception. I doubt the multilang logging could deliver some unexpected
error in time before the stdout pipe is closed.
As for serialization protocols.... it was abstracted on the Java side
recently which allows third party plugins to squeeze more CPU cycles with
custom serialization protocols. That's nice, but as JSON is the only supported
protocol by the ruby/python/nodejs implementations adding another serialization
protocol requires a feature matrix. The proper thing to focus on Storm's core
competency and that is Data. JSON would eventually become the bottle neck on
the Worker juggling multiple subprocess via serialization and de-serialization
of multilang messages. It takes approx 1% CPU core and 1ms latency per
multilang process assuming 1 message per second (hardware dependent ofcourse).
Instead of JSON I would pick one serialization protocol that has a heavily
optimized Java implementation and also widely used python/node/ruby/etc
implementations that hopefully do not require native code to install and stick
to it. Use serialization/deserialization optimization on the java side
(caching)
for better performance of the nextTuple and sync messages.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---