[jira] [Updated] (STORM-107) Add better ways to construct topologies

Erik Weathers (JIRA) Mon, 21 Sep 2015 18:17:05 -0700

     [ 
https://issues.apache.org/jira/browse/STORM-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erik Weathers updated STORM-107:
--------------------------------
    Description: 
https://github.com/nathanmarz/storm/issues/649

AFAIK the only way to construct a topology is to manually wire them together, 
e.g.

{code}
  (topology
   {"firehose" (spout-spec firehose-spout)}
   {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
                            some-bolt
                            :p 5)
    "our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
                             some-other-bolt
                             :p 6)})
{code}

This sort of manual specification of edges seems a bit too 1990's for me. I 
would like a modular way to express topologies, so that you can compose 
sub-topologies together. Another benefit of an alternative to this graph setup 
is that ensuring that the topology is correct does not mean tracing every edge 
in the graph to make sure the graph is right.

I am thinking maybe some sort of LINQ-style query that simply desugars to the 
arguments we pass into topology.

For example, the following could desugar into the two map arguments we're 
passing to topology:

{code}
(def firehose (mk-spout "firehose" firehose-spout))
(def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
(def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))

(from-in thing (compose firehose
                        bolt1
                        bolt2)
  (select thing))
{code}

Here from-in is pulling thing out of the result of compose'ing the firehose and 
the bolts, forming the topology we saw before. mk-spout would register a named 
spout spec, and the from macro would return the two dictionaries passed into 
topology.

The specification needs a lot of work, but I'm willing to write the patch 
myself once it's nailed down. The question is, do you want me to write it and 
send it off to you, or am I going to have to build a storm-tools repot to 
distribute it?


----------
mrflip:We have an internal tool for describing topologies at a high level, and 
though it hasn't reached production we have found:
1. it definitely makes sense to have one set of objects that describe 
topologies, and a different set of objects that express them. 
2. it probably makes sense to have those classes generate a static manifest: a 
lifeless JSON representation of a topology.

To the first point, initially we did it like storm: the FooEacher class would 
know how to wire itself into a topology(), and also know how to Foo each record 
that it received. We later refactored to separate topology construction from 
data handling: there is an EacherStage that represents anything that obeys the 
Each contract, so you'd say flow do source(:kafka_trident_spout) > 
eacher(:foo_eacher) > so_on() > and_so_forth(). The code became simpler and 
more powerful.
() Actually in storm stages are wired into the topology, but the issue is that 
they're around at run-time in both cases, requiring serialization and so forth.

More importantly, it's worth considering a static manifest.

The virtue of a manifest is that it is universal and static. If it's a JSON 
file, anything can generate it and anything can consume it; that would meet the 
needs of external programs which want to orchestrate Storm/Trident, as well as 
the repeated requests to visualize a topology in the UI. Also since it's 
static, the worker logic can simplify as it will know the whole graph in 
advance. From my experience, apart from the transactional code, the topology 
instantiation logic is the most complicated in the joint. That feels 
justifiable for the transaction logic but not for the topology instantiation.

The danger of a manifest is also that it is static -- you could find yourself 
on the primrose path to maven-style XML hell, where you wake up one day and 
find you've attached layers of ponderous machinery to make a static config file 
Turing-complete. I think the problem comes when you try to make the file 
human-editable. The manifest should expressly be the porcelain result of a DSL, 
with all decisions baked in -- it must not be a DSL.

In general, we find that absolute separation of orchestration (what things 
should be wired together) and action (actually doing things) seems painful at 
design time but ends up making things simpler and more powerful.

  was:
https://github.com/nathanmarz/storm/issues/649

AFAIK the only way to construct a topology is to manually wire them together, 
e.g.

  (topology
   {"firehose" (spout-spec firehose-spout)}
   {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
                            some-bolt
                            :p 5)
    "our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
                             some-other-bolt
                             :p 6)})

This sort of manual specification of edges seems a bit too 1990's for me. I 
would like a modular way to express topologies, so that you can compose 
sub-topologies together. Another benefit of an alternative to this graph setup 
is that ensuring that the topology is correct does not mean tracing every edge 
in the graph to make sure the graph is right.

I am thinking maybe some sort of LINQ-style query that simply desugars to the 
arguments we pass into topology.

For example, the following could desugar into the two map arguments we're 
passing to topology:

(def firehose (mk-spout "firehose" firehose-spout))
(def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
(def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))

(from-in thing (compose firehose
                        bolt1
                        bolt2)
  (select thing))

Here from-in is pulling thing out of the result of compose'ing the firehose and 
the bolts, forming the topology we saw before. mk-spout would register a named 
spout spec, and the from macro would return the two dictionaries passed into 
topology.

The specification needs a lot of work, but I'm willing to write the patch 
myself once it's nailed down. The question is, do you want me to write it and 
send it off to you, or am I going to have to build a storm-tools repot to 
distribute it?


----------
mrflip:We have an internal tool for describing topologies at a high level, and 
though it hasn't reached production we have found:
1. it definitely makes sense to have one set of objects that describe 
topologies, and a different set of objects that express them. 
2. it probably makes sense to have those classes generate a static manifest: a 
lifeless JSON representation of a topology.

To the first point, initially we did it like storm: the FooEacher class would 
know how to wire itself into a topology(), and also know how to Foo each record 
that it received. We later refactored to separate topology construction from 
data handling: there is an EacherStage that represents anything that obeys the 
Each contract, so you'd say flow do source(:kafka_trident_spout) > 
eacher(:foo_eacher) > so_on() > and_so_forth(). The code became simpler and 
more powerful.
() Actually in storm stages are wired into the topology, but the issue is that 
they're around at run-time in both cases, requiring serialization and so forth.

More importantly, it's worth considering a static manifest.

The virtue of a manifest is that it is universal and static. If it's a JSON 
file, anything can generate it and anything can consume it; that would meet the 
needs of external programs which want to orchestrate Storm/Trident, as well as 
the repeated requests to visualize a topology in the UI. Also since it's 
static, the worker logic can simplify as it will know the whole graph in 
advance. From my experience, apart from the transactional code, the topology 
instantiation logic is the most complicated in the joint. That feels 
justifiable for the transaction logic but not for the topology instantiation.

The danger of a manifest is also that it is static -- you could find yourself 
on the primrose path to maven-style XML hell, where you wake up one day and 
find you've attached layers of ponderous machinery to make a static config file 
Turing-complete. I think the problem comes when you try to make the file 
human-editable. The manifest should expressly be the porcelain result of a DSL, 
with all decisions baked in -- it must not be a DSL.

In general, we find that absolute separation of orchestration (what things 
should be wired together) and action (actually doing things) seems painful at 
design time but ends up making things simpler and more powerful.


> Add better ways to construct topologies
> ---------------------------------------
>
>                 Key: STORM-107
>                 URL: https://issues.apache.org/jira/browse/STORM-107
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/649
> AFAIK the only way to construct a topology is to manually wire them together, 
> e.g.
> {code}
>   (topology
>    {"firehose" (spout-spec firehose-spout)}
>    {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
>                             some-bolt
>                             :p 5)
>     "our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
>                              some-other-bolt
>                              :p 6)})
> {code}
> This sort of manual specification of edges seems a bit too 1990's for me. I 
> would like a modular way to express topologies, so that you can compose 
> sub-topologies together. Another benefit of an alternative to this graph 
> setup is that ensuring that the topology is correct does not mean tracing 
> every edge in the graph to make sure the graph is right.
> I am thinking maybe some sort of LINQ-style query that simply desugars to the 
> arguments we pass into topology.
> For example, the following could desugar into the two map arguments we're 
> passing to topology:
> {code}
> (def firehose (mk-spout "firehose" firehose-spout))
> (def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
> (def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))
> (from-in thing (compose firehose
>                         bolt1
>                         bolt2)
>   (select thing))
> {code}
> Here from-in is pulling thing out of the result of compose'ing the firehose 
> and the bolts, forming the topology we saw before. mk-spout would register a 
> named spout spec, and the from macro would return the two dictionaries passed 
> into topology.
> The specification needs a lot of work, but I'm willing to write the patch 
> myself once it's nailed down. The question is, do you want me to write it and 
> send it off to you, or am I going to have to build a storm-tools repot to 
> distribute it?
> ----------
> mrflip:We have an internal tool for describing topologies at a high level, 
> and though it hasn't reached production we have found:
> 1. it definitely makes sense to have one set of objects that describe 
> topologies, and a different set of objects that express them. 
> 2. it probably makes sense to have those classes generate a static manifest: 
> a lifeless JSON representation of a topology.
> To the first point, initially we did it like storm: the FooEacher class would 
> know how to wire itself into a topology(), and also know how to Foo each 
> record that it received. We later refactored to separate topology 
> construction from data handling: there is an EacherStage that represents 
> anything that obeys the Each contract, so you'd say flow do 
> source(:kafka_trident_spout) > eacher(:foo_eacher) > so_on() > 
> and_so_forth(). The code became simpler and more powerful.
> () Actually in storm stages are wired into the topology, but the issue is 
> that they're around at run-time in both cases, requiring serialization and so 
> forth.
> More importantly, it's worth considering a static manifest.
> The virtue of a manifest is that it is universal and static. If it's a JSON 
> file, anything can generate it and anything can consume it; that would meet 
> the needs of external programs which want to orchestrate Storm/Trident, as 
> well as the repeated requests to visualize a topology in the UI. Also since 
> it's static, the worker logic can simplify as it will know the whole graph in 
> advance. From my experience, apart from the transactional code, the topology 
> instantiation logic is the most complicated in the joint. That feels 
> justifiable for the transaction logic but not for the topology instantiation.
> The danger of a manifest is also that it is static -- you could find yourself 
> on the primrose path to maven-style XML hell, where you wake up one day and 
> find you've attached layers of ponderous machinery to make a static config 
> file Turing-complete. I think the problem comes when you try to make the file 
> human-editable. The manifest should expressly be the porcelain result of a 
> DSL, with all decisions baked in -- it must not be a DSL.
> In general, we find that absolute separation of orchestration (what things 
> should be wired together) and action (actually doing things) seems painful at 
> design time but ends up making things simpler and more powerful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-107) Add better ways to construct topologies

Reply via email to