[GitHub] incubator-storm pull request: nodejs multilang protocol implementa...

itaifrenkel Fri, 25 Jul 2014 15:08:43 -0700

Github user itaifrenkel commented on the pull request:

    https://github.com/apache/incubator-storm/pull/177#issuecomment-50210273
  
    Unit Tests: We can follow the way storm.rb is handled in the repository (3 
copies of the same file) and translate the ruby unit tests into nodejs. I can 
see why you don't want a third window being broken. But I request to take the 
multilang cleanup into another JIRA, which leads me to ...
    
    Packaging: Deploying and upgrading ruby/python/nodejs code is really 
different than Java code. Java code in storm is JARed-with-dependencies. 
Packing nodejs code with its entire npm dependency tree and automating it with 
maven is possible, but you loose the no-need-to-do-the-compile-thingy nature of 
the language. So even if we do call gem/pip/npm install from within the maven 
build process, pack everything in a JAR and ship it with storm, I am not sure I 
would use it that way. Instead the JAR would just be pure java code pointing to 
the folder where the ruby/python/nodejs code is. And that is also the folder 
where storm.rb/py/js would be copied to. So there is no reusable component you 
can ship with storm, other than the official storm.js file that was tested to 
work with this version of Storm.
    
    Maintenance: The amount of maintenance of the multilang project is approx 
numberOfFeatures times the numberOfLanguages times the numberOfSerialization 
protocols. 
    IMO the numberOfLanguages is a strong selling point for Storm (choose the 
best language that fits your business). Both ruby/python/node have a big crowd 
of open-source developers this project can rally.
    IMO numberOfFeautres that are not data related is not a strong selling 
point. For example I would not use the storm logging or monitoring through the 
multilang protocol. If there is a problem with the multilang/storm/whatever I 
would like a nodejs library to log it to a file, or contact directly the 
monitoring server, especially as Storm does the 
fail-fast-the-worker-with-all-child-processes each time a bolt/spout has an 
uncaught exception. I doubt the multilang logging could deliver some unexpected 
error in time before the stdout pipe is closed.
    As for serialization protocols.... it was abstracted on the Java side 
recently which allows third party plugins to squeeze more CPU cycles with 
custom serialization protocols. That's nice, but as JSON is the only supported 
protocol by the ruby/python/nodejs implementations adding another serialization 
protocol requires a feature matrix. The proper thing to focus on Storm's core 
competency and that is Data. JSON would eventually become the bottle neck on 
the Worker juggling multiple subprocess via serialization and de-serialization 
of multilang messages. It takes approx 1% CPU core and 1ms latency per 
multilang process assuming 1 message per second (hardware dependent ofcourse). 
Instead of JSON I would pick one serialization protocol that has a heavily 
optimized Java implementation and also widely used python/node/ruby/etc 
implementations that hopefully do not require native code to install and stick 
to it. Use serialization/deserialization optimization on the java side 
(caching) 
 for better performance of the nextTuple and sync messages.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-storm pull request: nodejs multilang protocol implementa...

Reply via email to