[jira] [Commented] (STORM-151) Support multilang in Trident

Alex Merritt (JIRA) Mon, 22 Dec 2014 16:57:53 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256384#comment-14256384
 ]


Alex Merritt commented on STORM-151:
------------------------------------

I am also interested in having multilang support for DRPC functionality in 
Storm. My bolts use (existing and already written) CUDA programs, so having 
some multilang protocol for DRPC would be very useful.

Is there any update on this?

> Support multilang in Trident
> ----------------------------
>
>                 Key: STORM-151
>                 URL: https://issues.apache.org/jira/browse/STORM-151
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/353
> Since it's impractical to have a separate subprocess for every operation 
> packed into a bolt, I think the best tradeoff is to allow multilang 
> operations that will just be their own bolt. This is a low level exposure of 
> the API and the multilang program will have to make sure to emit the batch id 
> as the first field. Multilang operations would be treated similar to 
> aggregators in that the tuples they produce are brand new with brand new 
> fields.
> ----------
> colinsurprenant: I am trying to wrap my head around this issue. Would it make 
> sense to expose classes like ShellFunction, ShellAggregator, ShellFilter, etc?
> A topology definition could be something like:
> topology.newDRPCStream("words")
>       .each(new Fields("args"), new ShellFunction("ruby", "split.rb"), new 
> Fields("word"))
>       .groupBy(new Fields("word"))
>       .stateQuery(wordCounts, new Fields("word"), new MapGet(), new 
> Fields("count"))
>       .each(new Fields("count"), new ShellFilter("ruby", "somefilter.rb"))
>       .aggregate(new Fields("count"), new ShellCombinerAggregator("ruby", 
> "someaggregator.rb"), new Fields("sum"));
> Then the builder would assign a new shell bolt for each of these shell 
> operations.
> ----------
> nathanmarz: That would be one way to go about it – though that will have a 
> lot of overhead with serializing back and forth between Java-land and 
> multilang-land. This is especially true if you have many operations running 
> in the same bolt (like 5 eaches in a row). Another way to go about it would 
> be to expose a lower level facility, where the shell process is a regular 
> bolt and has full control over the input and output tuples. Whatever fields 
> it emits will be the fields available in the output tuples. Additionally, one 
> of Trident's constraints is that the first field of output tuples contain the 
> "batch id".
> ----------
> colinsurprenant:Right. My proposition is basically what you described as 
> "impractical to have a separate subprocess for every operation" in the issue 
> description.
> So what you are suggesting is to just expose a ShellBolt, which would 
> basically respect the same semantic as the general Aggregator operation in 
> that it can emit any number of tuples with any number of fields in regard to 
> to input batch.
> ----------
> joshbronson: I'm probably missing something. But would exposing an interface 
> for an aggregator and using it carefully to avoid spawning too many processes 
> accomplish something similar, without having to worry about batch ids?
> We've run up against this problem recently and come up with a couple of ways 
> to get around the many-processes issue. We've considered spawning a server 
> that would listen on a standard TCP port range or Unix sockets. Is there a 
> place to hook in, other than the prepare method, to launch such a beast? 
> Other functions would presumably use their prepare method to discover and 
> connect to the server, and it would be nice to avoid races or locks.
> Also, we're working on an adapter for "BaseFunction" at the moment, though 
> it's not quite ready for public use, and I wonder if anything will go wrong 
> if a function continues to emit tuples to its collector after its execute 
> method has returned. We're operating under the assumption that something 
> will, so we've required the shell command the tell me when it's done 
> responding to the input we just gave it. Currently we do that by requiring it 
> to emit a JSON array. If we go with the server approach, we may also require 
> the script to handle a tab-delimited ID telling the server to delegate to the 
> appropriate handler for a function. Presumably this would require a thin 
> library in languages used to script storm. There seems to be an appetite here 
> for open-sourcing the components required to do this in Ruby; we definitely 
> want to make this easy for folks.
> ----------
> quintona: Hi Guys, I am needing the multilang support. From what I gather, 
> the idea would be to hook the shell bolt into the trident topology, and have 
> it ignore and handle the batch id field?
> If that understanding is correct, then it raises the interesting question 
> around support for any bolt within a Trident topology.
> ----------
> nathanmarz: Well, the right way to go is to essentially expose the Aggregator 
> interface via multilang. This is general enough to encompass functions, 
> filters, and aggregation. We could also allow users to mark multilang 
> aggregators as committers so that they can achieve exactly-once semantics 
> when interacting with external state.
> ----------
> quintona: I wasn't intending to allow the updating of external state via a 
> multilang function. Surely that is a multilang state discussion, which is 
> fundamentally different? Or am I missing something?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-151) Support multilang in Trident

Reply via email to