Neng,
https://github.com/twitter/heron/pull/2334
provides this abstraction.
The issue however is the follows. In Spout/Bolt world, every component is
explicitly named by the topology writer and thus all resources can be
specified on a per component basis. However in the dsl world, a) the
operators themselves dont have name and b) optimizations can squish the
operators into single physical operator. One possibility would be to add a
name optionally to the operator(like map(mapfn, name), but that seems too
cumbersome/kludgy)

On Thu, Sep 21, 2017 at 3:57 PM, Neng Lu <[email protected]> wrote:

> Just add some thoughts here: for ordinary heron topologies, the definition
> of a heron job and the request of resources usage for each component are
> separated: `TopologyBuilder` for job definition, `Config` for resource
> requirement.
>
> In the dsl case, if we could also do something similar that separates the
> dsl job creation and resources request, it would be really good. With this
> separation, people has the flexibility of providing different configs for
> the same job.
>
>
> On Wed, Sep 20, 2017 at 1:48 PM, Sanjeev Kulkarni <[email protected]>
> wrote:
>
> > Hi folks,
> > One of the great features of the lower level spout/bolt interface in
> Heron
> > is the ability to specify resources needed on a per component basis. This
> > feature is very helpful for tuning large topologies and is heavily used
> > inside Twitter.
> > Currently the DSL does not have this flexibility. I wanted to get
> opinions
> > about how we can add this.
> > There are probably several ways to do it. I'm listing a few approaches
> that
> > have come to my mind. Please feel free to add more.
> > 1) Currently some of our operators are simple(like flatMap, map, filter
> > operators), others are a little complicated(like transform where users
> can
> > perform setup/cleanup). We can take the approach of adding the ability to
> > specify resources only for complex operators. Thus transform could have
> two
> > variants. The current one which just takes a transform function and
> another
> > that takes in a resource parameter as well. The rest of other
> > operators(map/flatmap/filter, etc) will remain the same. The advantage of
> > this is that the interface explosion is minimal and controlled. The cons
> is
> > that if you need to control the resources of a particular operator, you
> are
> > forced to use transform.
> > 2) Another approach would be to add a variant that takes in a Resource
> > parameter to all operators. Pros is that this gives fine grained control
> to
> > all operators. Cons is the interface blow up.
> >
> > Thoughts?
> >
>

Reply via email to