[
https://issues.apache.org/jira/browse/STORM-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-107:
-------------------------------
Component/s: storm-core
> Add better ways to construct topologies
> ---------------------------------------
>
> Key: STORM-107
> URL: https://issues.apache.org/jira/browse/STORM-107
> Project: Apache Storm
> Issue Type: New Feature
> Components: storm-core
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/649
> AFAIK the only way to construct a topology is to manually wire them together,
> e.g.
> {code}
> (topology
> {"firehose" (spout-spec firehose-spout)}
> {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
> some-bolt
> :p 5)
> "our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
> some-other-bolt
> :p 6)})
> {code}
> This sort of manual specification of edges seems a bit too 1990's for me. I
> would like a modular way to express topologies, so that you can compose
> sub-topologies together. Another benefit of an alternative to this graph
> setup is that ensuring that the topology is correct does not mean tracing
> every edge in the graph to make sure the graph is right.
> I am thinking maybe some sort of LINQ-style query that simply desugars to the
> arguments we pass into topology.
> For example, the following could desugar into the two map arguments we're
> passing to topology:
> {code}
> (def firehose (mk-spout "firehose" firehose-spout))
> (def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
> (def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))
> (from-in thing (compose firehose
> bolt1
> bolt2)
> (select thing))
> {code}
> Here from-in is pulling thing out of the result of compose'ing the firehose
> and the bolts, forming the topology we saw before. mk-spout would register a
> named spout spec, and the from macro would return the two dictionaries passed
> into topology.
> The specification needs a lot of work, but I'm willing to write the patch
> myself once it's nailed down. The question is, do you want me to write it and
> send it off to you, or am I going to have to build a storm-tools repot to
> distribute it?
> ----------
> mrflip:We have an internal tool for describing topologies at a high level,
> and though it hasn't reached production we have found:
> 1. it definitely makes sense to have one set of objects that describe
> topologies, and a different set of objects that express them.
> 2. it probably makes sense to have those classes generate a static manifest:
> a lifeless JSON representation of a topology.
> To the first point, initially we did it like storm: the FooEacher class would
> know how to wire itself into a topology(), and also know how to Foo each
> record that it received. We later refactored to separate topology
> construction from data handling: there is an EacherStage that represents
> anything that obeys the Each contract, so you'd say flow do
> source(:kafka_trident_spout) > eacher(:foo_eacher) > so_on() >
> and_so_forth(). The code became simpler and more powerful.
> () Actually in storm stages are wired into the topology, but the issue is
> that they're around at run-time in both cases, requiring serialization and so
> forth.
> More importantly, it's worth considering a static manifest.
> The virtue of a manifest is that it is universal and static. If it's a JSON
> file, anything can generate it and anything can consume it; that would meet
> the needs of external programs which want to orchestrate Storm/Trident, as
> well as the repeated requests to visualize a topology in the UI. Also since
> it's static, the worker logic can simplify as it will know the whole graph in
> advance. From my experience, apart from the transactional code, the topology
> instantiation logic is the most complicated in the joint. That feels
> justifiable for the transaction logic but not for the topology instantiation.
> The danger of a manifest is also that it is static -- you could find yourself
> on the primrose path to maven-style XML hell, where you wake up one day and
> find you've attached layers of ponderous machinery to make a static config
> file Turing-complete. I think the problem comes when you try to make the file
> human-editable. The manifest should expressly be the porcelain result of a
> DSL, with all decisions baked in -- it must not be a DSL.
> In general, we find that absolute separation of orchestration (what things
> should be wired together) and action (actually doing things) seems painful at
> design time but ends up making things simpler and more powerful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)