James Xu created STORM-137:
------------------------------
Summary: add new feature: "topology package"
Key: STORM-137
URL: https://issues.apache.org/jira/browse/STORM-137
Project: Apache Storm (Incubating)
Issue Type: New Feature
Reporter: James Xu
https://github.com/nathanmarz/storm/issues/557
Submitting a topology to storm requires executing code which constructs bolts,
spouts, a topology and then calling StormSubmitter.submitTopology and uploads
the jar, config, serialized objects to Nimbus.
If you want to have a topology binary store, so that you can retain all
versions of production topologies, there's this really precarious property of
stormSubmitter where the serialized object state needs to be recomputed, and
the assumption is the code used to create serialized object state is a pure
function, and has no external dependencies. If either of those properties are
not true, for example maybe your API key is queried at this point in time then
merely storing the jar is not sufficient to redeploy an older version of a
topology (i'm making this up, but one user was trying to access ZK here, don't
know why, it's just very precarious to recompute state each time.)
So my proposal is adding to StormSubmitter and storm command line tool this
ability to create "topology package" which contain inside the jar, object
serialized state, and topology config (preferably in yaml). And then another
StormSubmitter API for accepting topology packages.
---------
jasonjckn: @nathanmarz could you comment on this asap?
---------
nathanmarz: It would be interesting to have syntax like:
storm package {name of output file} {jar} {class} {args}
which changes the behavior of StormSubmitter#submitTopology to serialize the
topology and package it with the jar into a "package" file. It would also be
cool if storm deploy would automatically detect these package files and do the
right thing with them.
---------
jasonjckn: So will user code still call StormSubmitter.submitTopology? or
should they call .createPackage?
Right now people can call StormSubmitter.submitTopology as many times as they
want in the main function and submit multiple topologies.
storm package {name of output file} {jar} {class} {args}
What happens if they call submitTopology twice, is the same outfile filename is
used twice? That's why i'm recommending we do this:
package mytopologpkg;
class MyTopology {
public static main(String[] args) {
Topology topology = TopologyBuilder.setSpout(...).buildTopology();
TopologyCommandLine.processAction(topology, args);
}
}
Then the user would execute commands like this:
java -cp topology.jar mytopologpkg.MyTopology make_package <filename>
java -cp topology.jar mytopologpkg.MyTopology submit -c nimbus.host=xyz <name>
java -cp topology.jar mytopologpkg.MyTopology kill <name>
java -cp topology.jar mytopologpkg.MyTopology submit_package -c nimbus.host=xyz
<filename>
This is also valid:
java -cp topology.jar backtype.storm.CommandLine kill <name>
java -cp topology.jar backtype.storm.CommandLine submit_package -c
nimbus.host=xyz <filename>
Users you have this complexity of needing to ensure the storm zip release you
downloaded matches the library version you compiled the code with. Everything
you need is actually in the storm library jar.
My last idea has sit in my head for a while, It has a couple problems and not a
good idea.
Something like this would work:
package mytopologpkg;
class MyTopology implements storm.ITopologyPackage {
@override
Map getTopologyConf(String[] commandLineArgs) {
......
}
@override
StormTopology getTopology(String[] commandLineArgs) {
.....
}
}
$ storm make-package-file mytopology.MyTopology package-filename.zip
[commandLineArg]
$ storm submit package-filename.zip -c nimbus.host=xyz -c
topology.name=my-topology-name
OR
$ storm submit mytopology.MyTopology -c nimbus.host=xyz -c
topology.name=my-topology-name [commandLineArgs]
NOTE: "-c" means override the conf returned from getTopologyConf.
NOTE: there's no main function anymore, however if someone does write a main
function, they can run it with "storm jar" semantically i think of this as
"exec-jar-main with the ability to pass topology conf overrides" and
semantically says nothing about how many topologies are submitted or if
StormSubmitter.makePackage(filename, topology, stormConf) is called.
static void main(String[] args) {
t1 = buildTopologyT1();
.... StormSubmitter.submitTopology(t1)
t2 = buildTopologyT2();
.... StormSubmitter.submitTopology(t2)
}
I passed this issue to another engineer who was working on package versioning
and they didn't want to implement this, so it never got done.
I just realized there is a tradeoff going on here, if you persist the jar AND
serialized object state in a version binary store, then if the class
serializationUID ever changes, that state is useless. However if you merely
store the jar in the version binary store, and rerun the main method with $
storm jar path.to.mainClass, even if the class serializationUID changed, it
doesn't matter. So you get more affordances on making backwards compatible
changes if you don't store serialized object state in a topology package as
proposed above.
An API for downloading a topology package would be useful for writing automated
tools that move topologies between multiple storm clusters
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)