[
https://issues.apache.org/jira/browse/STORM-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980750#comment-13980750
]
ASF GitHub Bot commented on STORM-138:
--------------------------------------
Github user jsgilmore commented on the pull request:
https://github.com/apache/incubator-storm/pull/84#issuecomment-41364889
We have not seen the need to have different encoding schemes on a bolt
level. I can't see why you would want to use JSON at all if a scheme is
available that provides better throughput at lower CPU requirements. We moved
from JSON to protocol buffers and now all our topologies use that scheme.
It would be helpful to get some more thoughts on this subject and if it is
required, I would be happy to change it, but I would prefer to do it as a
future pull request. I don't mind changing the configuration option name.
> Pluggable serialization for multilang
> -------------------------------------
>
> Key: STORM-138
> URL: https://issues.apache.org/jira/browse/STORM-138
> Project: Apache Storm (Incubating)
> Issue Type: New Feature
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/373
> Currently JSON is used to serialize tuples for multilang. It would be great
> if the serialization mechanism were pluggable so that using richer types with
> multilang would be possible.
> ---------
> francis-liberty: Hello, I am a newbie here, and I wanted to pick up this
> issue. I also noticed a recent PR here #697 by jsgilmore, is it feasible for
> this issue, too?
> I looked around the source code, and I would like to talk about my opinions
> on this issue here.
> For now, ShellProcess only supports JSON to communicate with multilang
> process: read, write. And, ShellSpout and ShellBolt talk with ShellProcess
> through JSON, too. This is all because ShellProcess's interface use
> JSONObject only. Conceptually, ShellProcess should encapsulate the multilang
> details, and talk with Bolt and Spout using Tuple. (jsgilmore invented two
> new classes, Immission and Emission. But I think all information Bolt and
> Spout need is in Tuple already, no need for new data structures.) So, I think
> it would be much cleaner to do serialization in ShellProcess only, and both
> ShellSpout and ShellBolt don't know anything about how ShellProcess convert
> between Tuple and strings.
> So, I suppose I can do the work of
> 1. change the interface of ShellProcess to return and accept Tuple data
> structure, instead of JSONObject.
> 2. make ShellSpout and ShellBolt work on Tuple, all information like task_id,
> stream_id and tuples should be retrieve/encapsulate in this data structure.
> 3. what other serialization format would you like to add? I think in the end
> we need to add some example other than JSON to storm-starter storm.py/rb,
> which I would also like to work on.
> ----------
> jsgilmore: Hi, all serialisation is done in the JSONSerialiser, so no
> serialisation is done in ShellBolt, ShellProcess or ShellSpout. They just
> send around the Emission and Immission classes. The point of the ISerializer
> interface is to achieve the separation of serialisation.
> I come from the multilang side of Storm, so I'm not that familiar with the
> internal Storm structures. If there is a class that the ISerializer interface
> can use, instead of the Emission and Immission classes, I'm open to it.
> I would recommend that further discussion of PR #697 rather happen in the PR
> thread itself though.
> I created an issue to add protocol buffer serialisation for multilang to
> Storm in issue #654 , but I didn't see this issue. The whole purpose of PR
> #697 is to solve this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)