[ 
https://issues.apache.org/jira/browse/STORM-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-137:
-------------------------------
    Component/s: storm-core

> add new feature: "topology package"
> -----------------------------------
>
>                 Key: STORM-137
>                 URL: https://issues.apache.org/jira/browse/STORM-137
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/557
> Submitting a topology to storm requires executing code which constructs 
> bolts, spouts, a topology and then calling StormSubmitter.submitTopology and 
> uploads the jar, config, serialized objects to Nimbus.
> If you want to have a topology binary store, so that you can retain all 
> versions of production topologies, there's this really precarious property of 
> stormSubmitter where the serialized object state needs to be recomputed, and 
> the assumption is the code used to create serialized object state is a pure 
> function, and has no external dependencies. If either of those properties are 
> not true, for example maybe your API key is queried at this point in time 
> then merely storing the jar is not sufficient to redeploy an older version of 
> a topology (i'm making this up, but one user was trying to access ZK here, 
> don't know why, it's just very precarious to recompute state each time.)
> So my proposal is adding to StormSubmitter and storm command line tool this 
> ability to create "topology package" which contain inside the jar, object 
> serialized state, and topology config (preferably in yaml). And then another 
> StormSubmitter API for accepting topology packages.
> ---------
> jasonjckn: @nathanmarz could you comment on this asap?
> ---------
> nathanmarz: It would be interesting to have syntax like:
> storm package {name of output file} {jar} {class} {args}
> which changes the behavior of StormSubmitter#submitTopology to serialize the 
> topology and package it with the jar into a "package" file. It would also be 
> cool if storm deploy would automatically detect these package files and do 
> the right thing with them.
> ---------
> jasonjckn: So will user code still call StormSubmitter.submitTopology? or 
> should they call .createPackage?
> Right now people can call StormSubmitter.submitTopology as many times as they 
> want in the main function and submit multiple topologies.
> storm package {name of output file} {jar} {class} {args}
> What happens if they call submitTopology twice, is the same outfile filename 
> is used twice? That's why i'm recommending we do this:
> package mytopologpkg;
> class MyTopology {
> public static main(String[] args) {
> Topology topology = TopologyBuilder.setSpout(...).buildTopology();
> TopologyCommandLine.processAction(topology, args);
> }
> }
> Then the user would execute commands like this:
> java -cp topology.jar mytopologpkg.MyTopology make_package <filename>
> java -cp topology.jar mytopologpkg.MyTopology submit -c nimbus.host=xyz  
> <name>
> java -cp topology.jar mytopologpkg.MyTopology kill <name>
> java -cp topology.jar mytopologpkg.MyTopology submit_package -c 
> nimbus.host=xyz <filename>
> This is also valid:
> java -cp topology.jar backtype.storm.CommandLine kill <name>
> java -cp topology.jar backtype.storm.CommandLine submit_package -c 
> nimbus.host=xyz <filename>
> Users you have this complexity of needing to ensure the storm zip release you 
> downloaded matches the library version you compiled the code with. Everything 
> you need is actually in the storm library jar.
> My last idea has sit in my head for a while, It has a couple problems and not 
> a good idea.
> Something like this would work:
> package mytopologpkg;
> class MyTopology implements storm.ITopologyPackage {
>      @override
>      Map getTopologyConf(String[] commandLineArgs) {
>            ......
>      }
>      @override
>      StormTopology getTopology(String[] commandLineArgs) {
>            .....
>      }
> }
> $ storm make-package-file mytopology.MyTopology package-filename.zip 
> [commandLineArg]
> $ storm submit package-filename.zip -c nimbus.host=xyz -c 
> topology.name=my-topology-name
> OR
> $ storm submit mytopology.MyTopology -c nimbus.host=xyz -c 
> topology.name=my-topology-name  [commandLineArgs]
> NOTE: "-c" means override the conf returned from getTopologyConf.
> NOTE: there's no main function anymore, however if someone does write a main 
> function, they can run it with "storm jar" semantically i think of this as 
> "exec-jar-main with the ability to pass topology conf overrides" and 
> semantically says nothing about how many topologies are submitted or if 
> StormSubmitter.makePackage(filename, topology, stormConf) is called.
> static void main(String[] args) {
>       t1 = buildTopologyT1();
>       .... StormSubmitter.submitTopology(t1)
>       t2 = buildTopologyT2();
>       .... StormSubmitter.submitTopology(t2)
> } 
> I passed this issue to another engineer who was working on package versioning 
> and they didn't want to implement this, so it never got done.
> I just realized there is a tradeoff going on here, if you persist the jar AND 
> serialized object state in a version binary store, then if the class 
> serializationUID ever changes, that state is useless. However if you merely 
> store the jar in the version binary store, and rerun the main method with $ 
> storm jar path.to.mainClass, even if the class serializationUID changed, it 
> doesn't matter. So you get more affordances on making backwards compatible 
> changes if you don't store serialized object state in a topology package as 
> proposed above.
> An API for downloading a topology package would be useful for writing 
> automated tools that move topologies between multiple storm clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to