James Xu created STORM-137:
------------------------------

             Summary: add new feature: "topology package"
                 Key: STORM-137
                 URL: https://issues.apache.org/jira/browse/STORM-137
             Project: Apache Storm (Incubating)
          Issue Type: New Feature
            Reporter: James Xu


https://github.com/nathanmarz/storm/issues/557

Submitting a topology to storm requires executing code which constructs bolts, 
spouts, a topology and then calling StormSubmitter.submitTopology and uploads 
the jar, config, serialized objects to Nimbus.

If you want to have a topology binary store, so that you can retain all 
versions of production topologies, there's this really precarious property of 
stormSubmitter where the serialized object state needs to be recomputed, and 
the assumption is the code used to create serialized object state is a pure 
function, and has no external dependencies. If either of those properties are 
not true, for example maybe your API key is queried at this point in time then 
merely storing the jar is not sufficient to redeploy an older version of a 
topology (i'm making this up, but one user was trying to access ZK here, don't 
know why, it's just very precarious to recompute state each time.)

So my proposal is adding to StormSubmitter and storm command line tool this 
ability to create "topology package" which contain inside the jar, object 
serialized state, and topology config (preferably in yaml). And then another 
StormSubmitter API for accepting topology packages.

---------
jasonjckn: @nathanmarz could you comment on this asap?

---------
nathanmarz: It would be interesting to have syntax like:

storm package {name of output file} {jar} {class} {args}

which changes the behavior of StormSubmitter#submitTopology to serialize the 
topology and package it with the jar into a "package" file. It would also be 
cool if storm deploy would automatically detect these package files and do the 
right thing with them.

---------
jasonjckn: So will user code still call StormSubmitter.submitTopology? or 
should they call .createPackage?

Right now people can call StormSubmitter.submitTopology as many times as they 
want in the main function and submit multiple topologies.

storm package {name of output file} {jar} {class} {args}
What happens if they call submitTopology twice, is the same outfile filename is 
used twice? That's why i'm recommending we do this:

package mytopologpkg;
class MyTopology {
public static main(String[] args) {
Topology topology = TopologyBuilder.setSpout(...).buildTopology();
TopologyCommandLine.processAction(topology, args);
}
}
Then the user would execute commands like this:


java -cp topology.jar mytopologpkg.MyTopology make_package <filename>
java -cp topology.jar mytopologpkg.MyTopology submit -c nimbus.host=xyz  <name>

java -cp topology.jar mytopologpkg.MyTopology kill <name>
java -cp topology.jar mytopologpkg.MyTopology submit_package -c nimbus.host=xyz 
<filename>
This is also valid:

java -cp topology.jar backtype.storm.CommandLine kill <name>
java -cp topology.jar backtype.storm.CommandLine submit_package -c 
nimbus.host=xyz <filename>
Users you have this complexity of needing to ensure the storm zip release you 
downloaded matches the library version you compiled the code with. Everything 
you need is actually in the storm library jar.

My last idea has sit in my head for a while, It has a couple problems and not a 
good idea.

Something like this would work:

package mytopologpkg;
class MyTopology implements storm.ITopologyPackage {
     @override
     Map getTopologyConf(String[] commandLineArgs) {
           ......
     }
     @override
     StormTopology getTopology(String[] commandLineArgs) {
           .....
     }
}
$ storm make-package-file mytopology.MyTopology package-filename.zip 
[commandLineArg]
$ storm submit package-filename.zip -c nimbus.host=xyz -c 
topology.name=my-topology-name

OR

$ storm submit mytopology.MyTopology -c nimbus.host=xyz -c 
topology.name=my-topology-name  [commandLineArgs]
NOTE: "-c" means override the conf returned from getTopologyConf.

NOTE: there's no main function anymore, however if someone does write a main 
function, they can run it with "storm jar" semantically i think of this as 
"exec-jar-main with the ability to pass topology conf overrides" and 
semantically says nothing about how many topologies are submitted or if 
StormSubmitter.makePackage(filename, topology, stormConf) is called.

static void main(String[] args) {
      t1 = buildTopologyT1();
      .... StormSubmitter.submitTopology(t1)
      t2 = buildTopologyT2();
      .... StormSubmitter.submitTopology(t2)
} 


I passed this issue to another engineer who was working on package versioning 
and they didn't want to implement this, so it never got done.

I just realized there is a tradeoff going on here, if you persist the jar AND 
serialized object state in a version binary store, then if the class 
serializationUID ever changes, that state is useless. However if you merely 
store the jar in the version binary store, and rerun the main method with $ 
storm jar path.to.mainClass, even if the class serializationUID changed, it 
doesn't matter. So you get more affordances on making backwards compatible 
changes if you don't store serialized object state in a topology package as 
proposed above.


An API for downloading a topology package would be useful for writing automated 
tools that move topologies between multiple storm clusters




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to