[ 
https://issues.apache.org/jira/browse/STORM-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-107:
-------------------------------
    Component/s: storm-core

> Add better ways to construct topologies
> ---------------------------------------
>
>                 Key: STORM-107
>                 URL: https://issues.apache.org/jira/browse/STORM-107
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/649
> AFAIK the only way to construct a topology is to manually wire them together, 
> e.g.
> {code}
>   (topology
>    {"firehose" (spout-spec firehose-spout)}
>    {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
>                             some-bolt
>                             :p 5)
>     "our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
>                              some-other-bolt
>                              :p 6)})
> {code}
> This sort of manual specification of edges seems a bit too 1990's for me. I 
> would like a modular way to express topologies, so that you can compose 
> sub-topologies together. Another benefit of an alternative to this graph 
> setup is that ensuring that the topology is correct does not mean tracing 
> every edge in the graph to make sure the graph is right.
> I am thinking maybe some sort of LINQ-style query that simply desugars to the 
> arguments we pass into topology.
> For example, the following could desugar into the two map arguments we're 
> passing to topology:
> {code}
> (def firehose (mk-spout "firehose" firehose-spout))
> (def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
> (def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))
> (from-in thing (compose firehose
>                         bolt1
>                         bolt2)
>   (select thing))
> {code}
> Here from-in is pulling thing out of the result of compose'ing the firehose 
> and the bolts, forming the topology we saw before. mk-spout would register a 
> named spout spec, and the from macro would return the two dictionaries passed 
> into topology.
> The specification needs a lot of work, but I'm willing to write the patch 
> myself once it's nailed down. The question is, do you want me to write it and 
> send it off to you, or am I going to have to build a storm-tools repot to 
> distribute it?
> ----------
> mrflip:We have an internal tool for describing topologies at a high level, 
> and though it hasn't reached production we have found:
> 1. it definitely makes sense to have one set of objects that describe 
> topologies, and a different set of objects that express them. 
> 2. it probably makes sense to have those classes generate a static manifest: 
> a lifeless JSON representation of a topology.
> To the first point, initially we did it like storm: the FooEacher class would 
> know how to wire itself into a topology(), and also know how to Foo each 
> record that it received. We later refactored to separate topology 
> construction from data handling: there is an EacherStage that represents 
> anything that obeys the Each contract, so you'd say flow do 
> source(:kafka_trident_spout) > eacher(:foo_eacher) > so_on() > 
> and_so_forth(). The code became simpler and more powerful.
> () Actually in storm stages are wired into the topology, but the issue is 
> that they're around at run-time in both cases, requiring serialization and so 
> forth.
> More importantly, it's worth considering a static manifest.
> The virtue of a manifest is that it is universal and static. If it's a JSON 
> file, anything can generate it and anything can consume it; that would meet 
> the needs of external programs which want to orchestrate Storm/Trident, as 
> well as the repeated requests to visualize a topology in the UI. Also since 
> it's static, the worker logic can simplify as it will know the whole graph in 
> advance. From my experience, apart from the transactional code, the topology 
> instantiation logic is the most complicated in the joint. That feels 
> justifiable for the transaction logic but not for the topology instantiation.
> The danger of a manifest is also that it is static -- you could find yourself 
> on the primrose path to maven-style XML hell, where you wake up one day and 
> find you've attached layers of ponderous machinery to make a static config 
> file Turing-complete. I think the problem comes when you try to make the file 
> human-editable. The manifest should expressly be the porcelain result of a 
> DSL, with all decisions baked in -- it must not be a DSL.
> In general, we find that absolute separation of orchestration (what things 
> should be wired together) and action (actually doing things) seems painful at 
> design time but ends up making things simpler and more powerful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to