James Xu created STORM-138:
------------------------------

             Summary: Pluggable serialization for multilang
                 Key: STORM-138
                 URL: https://issues.apache.org/jira/browse/STORM-138
             Project: Apache Storm (Incubating)
          Issue Type: New Feature
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/373

Currently JSON is used to serialize tuples for multilang. It would be great if 
the serialization mechanism were pluggable so that using richer types with 
multilang would be possible.

---------
francis-liberty: Hello, I am a newbie here, and I wanted to pick up this issue. 
I also noticed a recent PR here #697 by jsgilmore, is it feasible for this 
issue, too?

I looked around the source code, and I would like to talk about my opinions on 
this issue here.

For now, ShellProcess only supports JSON to communicate with multilang process: 
read, write. And, ShellSpout and ShellBolt talk with ShellProcess through JSON, 
too. This is all because ShellProcess's interface use JSONObject only. 
Conceptually, ShellProcess should encapsulate the multilang details, and talk 
with Bolt and Spout using Tuple. (jsgilmore invented two new classes, Immission 
and Emission. But I think all information Bolt and Spout need is in Tuple 
already, no need for new data structures.) So, I think it would be much cleaner 
to do serialization in ShellProcess only, and both ShellSpout and ShellBolt 
don't know anything about how ShellProcess convert between Tuple and strings.

So, I suppose I can do the work of
1. change the interface of ShellProcess to return and accept Tuple data 
structure, instead of JSONObject.
2. make ShellSpout and ShellBolt work on Tuple, all information like task_id, 
stream_id and tuples should be retrieve/encapsulate in this data structure.
3. what other serialization format would you like to add? I think in the end we 
need to add some example other than JSON to storm-starter storm.py/rb, which I 
would also like to work on.

----------
jsgilmore: Hi, all serialisation is done in the JSONSerialiser, so no 
serialisation is done in ShellBolt, ShellProcess or ShellSpout. They just send 
around the Emission and Immission classes. The point of the ISerializer 
interface is to achieve the separation of serialisation.

I come from the multilang side of Storm, so I'm not that familiar with the 
internal Storm structures. If there is a class that the ISerializer interface 
can use, instead of the Emission and Immission classes, I'm open to it.

I would recommend that further discussion of PR #697 rather happen in the PR 
thread itself though.

I created an issue to add protocol buffer serialisation for multilang to Storm 
in issue #654 , but I didn't see this issue. The whole purpose of PR #697 is to 
solve this issue.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to