Re: Declarative Workflow update & shorthand/DSL

Geoff Macartney Wed, 21 Sep 2022 12:35:53 -0700

Hi Alex, Mykola,

By the way I should mention that I'm very busy in the evenings this week so
might not get to look at the latest PR for a while. By all means go ahead
and merge it if Mykola and/or others are happy with it, no need to wait for
me.


Cheers
Geoff


On Tue, 20 Sept 2022, 22:20 Geoff Macartney, <[email protected]> wrote:

> Hi Alex,
>
> +1 This updated proposal looks good - I do think the list based
> approach will be simpler and less error prone, and the fact that you
> will support an optional `id` anyway, if that is desired, means it
> retains much of the flexibility of the map based approach. The custom
> workflow step looks a little like the "functions" that we discussed
> previously. Putting this all together will be pretty powerful.
>
> Will try to get a look at the latest PR if I can.
>
> Cheers
> Geoff
>
>
> On Mon, 19 Sept 2022 at 17:31, Alex Heneveld <[email protected]> wrote:
> >
> > Geoff-  Thanks.  Comments addressed in #1361 along with a major addition
> to
> > support variables -- inputs/outputs/etc.
> >
> > All-  One of the points Geoff makes concerns how steps are defined.  I
> > think along with other comments that tips the balance in favour of
> > revisiting how steps are defined.
> >
> > I propose we switch from the OLD proposed approach -- the map of ordered
> > IDs -- to a NEW LIST-BASED approach.  There's a lot of detail below but
> > in-short it's shifting from:
> >
> > steps:
> >   1-say-hi:  log hi
> >   2-step-two:  log step 2
> >
> > To:
> >
> > steps:
> >   - log hi
> >   - log step 2
> >
> >
> > Specifically, based on feedback and more hands-on experience, I propose:
> >
> > * steps now be supplied as a list (now a map)
> > * users are no longer required to supply an ID for each step (in the old
> > approach, the ID was required as the key for every step)
> > * users can if they wish supply an ID for any step (now as an explicit
> `id:
> > <ID>` rule)
> > * the default order, if no `next: <ID>` instruction is supplied, is the
> > order of the list (in the old approach the order was based on the ID)
> >
> > Also, the shorthand idea has evolved a little bit; instead of a "<type>:
> > <type-specific-shorthand-template>" single-key map, we've suggested:
> >
> > * it be a string "<type> <type-specific-shorthand-template>"
> > * shorthand can also be supplied in a map using the key "s" or the key
> > "shorthand" (to allow shorthand along with other step key values)
> > * custom steps can define custom shorthand templates (e.g. "${key} "="
> > ${value}")
> > * (there is also some evolution in how custom steps are defined)
> >
> >
> > To illustrate:
> >
> > The OLD EXAMPLE:
> >
> > steps:
> >    1:
> >       type: container
> >       image: my/google-cloud
> >       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
> >       env:
> >         BUCKET: $brooklyn:config("bucket")
> >       on-error: retry
> >     2:
> >       set-sensor: spark-output=${1.stdout}
> >
> > Would become in the NEW proposal:
> >
> > steps:
> >     - type: container
> >       image: my/google-cloud
> >       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
> >       env:
> >         BUCKET: $brooklyn:config("bucket")
> >       on-error: retry
> >     - set-sensor spark-output = ${1.stdout}
> >
> > If we wanted to attach an `id` to the second step (e.g. for use with
> > "next") we could write it either as:
> >
> >     # full long-hand map
> >     - type: set-sensor
> >       input:
> >         sensor: spark-output
> >         value: ${1.stdout}
> >       id: set-spark-output
> >
> >     # mixed "s" shorthand key and other fields
> >     - s: set-sensor spark-output = ${1.stdout}
> >       id: set-spark-output
> >
> > To explain the reasoning:
> >
> > The advantages of steps:
> >
> > * Slightly less verbose when no ID is needed on a step
> > * Easier to read and understand flow
> > * Avoids hassle of renumbering when introducing step
> > * Avoids risk of error where same key defined multiple time
> >
> > The advantages of OLD map-based scheme (implied disadvantages of the new
> > steps process):
> >
> > * Easier user-facing correlation on steps (e.g. in UI) by always having
> an
> > explicit ID for easier correlation
> > * Easier to extend a workflow by inserting or overriding explicit steps
> >
> > After some initial usage of the workflow, it seems these advantages of
> the
> > old approach are outweighed by the advantages of the list approach.  In
> > particular the "correlation" can be done in other ways, and extending a
> > workflow is probably not so useful, whereas supplying and maintaining an
> ID
> > is a hassle, error-prone, and harder to understand.
> >
> > Finally to explain the custom steps idea, it works out nicely in the code
> > and we think for users to add a "compound-step" to the catalog e.g. as
> > follows for the workflow shown above:
> >
> >   id: retryable-gcloud-dataproc-with-bucket-and-sensor
> >   item:
> >     type: custom-workflow-step
> >     parameters:
> >       bucket:
> >         type: string
> >       sensor_name:
> >         type: string
> >         default: spark-output
> >     shorthand_definition: [ " bucket " ${bucket} ] [ " sensor "
> > ${sensor_name} ]
> >     steps:
> >     - type: container
> >       image: my/google-cloud
> >       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
> >       env:
> >         BUCKET: ${bucket}
> >       on-error: retry
> >     - set-sensor ${sensor_name} = ${1.stdout}
> >
> > A user could then write a step:
> >
> > - retryable-gcloud-dataproc-with-bucket-and-sensor
> >
> > And optionally use the shorthand per the shorthand_definition, matching
> the
> > quoted string literals and inferring the indicated parameters, e.g.:
> >
> > - retryable-gcloud-dataproc-with-bucket-and-sensor bucket my-bucket
> sensor
> > my-spark-output
> >
> > They could of course also use the longhand:
> >
> > - type: retryable-gcloud-dataproc-with-bucket-and-sensor
> >   input:
> >     bucket: my-bucket
> >     sensor_name: my-spark-output
> >
> >
> > Best
> > Alex
> >
> >
> >
> > On Sat, 17 Sept 2022 at 21:13, Geoff Macartney <[email protected]>
> wrote:
> >
> > > Hi Alex,
> > >
> > > Belatedly reviewed the PR. It's looking good! And surprisingly simple
> > > in the end. Made a couple of minor comments on it.
> > >
> > > Cheers
> > > Geoff
> > >
> > > On Thu, 8 Sept 2022 at 09:35, Alex Heneveld <[email protected]> wrote:
> > > >
> > > > Hi team,
> > > >
> > > > An initial PR with a few types and the ability to define an effector
> is
> > > > available [1].
> > > >
> > > > This is enough for the next steps to be parallelized, e.g. new steps
> > > > added.  The proposal has been updated with a work plan / list of
> tasks
> > > > [2].  Any volunteers to help with some of the upcoming tasks let me
> know.
> > > >
> > > > Finally I've been thinking about the "shorthand syntax" and how to
> bring
> > > us
> > > > closer to Peter's proposal of a DSL.  The original proposal allowed
> > > instead
> > > > of a map e.g.
> > > >
> > > > step_sleep:
> > > >   type: sleep
> > > >   duration: 5s
> > > >
> > > > or
> > > >
> > > > step_update_service_up:
> > > >   type: set-sensor
> > > >   sensor:
> > > >     name: service.isUp
> > > >     type: boolean
> > > >   value: true
> > > >
> > > > being able to use a shorthand _map_ with a single key being the
> type, and
> > > > value interpreted by that type, so in the OLD SHORTHAND PROPOSAL the
> > > above
> > > > could be written:
> > > >
> > > > step_sleep:
> > > >   sleep: 5s
> > > >
> > > > step_update_service_up:
> > > >   set-sensor: service.isUp = true
> > > >
> > > > Having played with syntaxes a bit I wonder if we should instead say
> the
> > > > shorthand DSL kicks in when the step _body_ is a string (instead of a
> > > > single-key map), and the first word of the string being the type,
> and the
> > > > remainder interpreted by the type, and we allow it to be a bit more
> > > > ambitious.
> > > >
> > > > Concretely this NEW SHORTHAND PROPOSAL would look something like:
> > > >
> > > > step_sleep: sleep 5s
> > > > step_update_service_up: set-sensor service.isUp = true
> > > > # also supporting a type, ie `set-sensor [TYPE] NAME = VALUE`, eg
> > > > step_update_service_up: set-sensor boolean service.isUp = true
> > > >
> > > > You would still need the full map syntax whenever defining flow
> logic --
> > > eg
> > > > condition, next, retry, or timeout -- or any property not supported
> by
> > > the
> > > > shorthand syntax.  But for the (majority?) simple cases the
> expression
> > > > would be very concise.  In most cases I think it would feel like a
> DSL
> > > but
> > > > has the virtue of a very clear translation to the actual workflow
> model
> > > and
> > > > the underlying (YAML) model needed for resumption and UI.
> > > >
> > > > As a final example, the example used at the start of the proposal
> > > > (simplified a little -- removing on-error retry and env map as those
> > > > wouldn't be supported by shorthand):
> > > >
> > > > brooklyn.initializers:
> > > > - type: workflow-effector
> > > >  name: run-spark-on-gcp
> > > >  steps:
> > > >    1:
> > > >       type: container
> > > >       image: my/google-cloud
> > > >       command: gcloud dataproc jobs submit spark
> > > > --BUCKET=gs://$brooklyn:config("bucket")
> > > >     2:
> > > >       type: set-sensor
> > > >       sensor: spark-output
> > > >       value: ${1.stdout}
> > > >
> > > > Could be written in this shorthand as follows:
> > > >
> > > >  steps:
> > > >    1: container my/google-cloud command "gcloud dataproc jobs submit
> > > spark
> > > > --BUCKET=gs://${entity.config.bucket}"
> > > >    2: set-sensor spark-output ${1.stdout}
> > > >
> > > > Thoughts?
> > > >
> > > > Best
> > > > Alex
> > > >
> > > >
> > > > [1] https://github.com/apache/brooklyn-server/pull/1358
> > > > [2]
> > > >
> > >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.gbadaqa2yql6
> > > >
> > > >
> > > > On Wed, 7 Sept 2022 at 09:58, Alex Heneveld <[email protected]>
> wrote:
> > > >
> > > > > Hi Peter,
> > > > >
> > > > > Yes - thanks for the extra details.  I did take your suggestion to
> be a
> > > > > procedural DSL not YAML, per the illustration at [1] (second code
> > > block).
> > > > > Probably where I was confusing was in saying that unlike DSLs which
> > > just
> > > > > run (and where the execution can be delegated to eg
> java/groovy/ruby),
> > > here
> > > > > we need to understand and display, store and resume the workflow
> > > progress.
> > > > > So I think it needs to be compiled to some representation that is
> well
> > > > > described and that new Apache Brooklyn code can reason about, both
> in
> > > the
> > > > > UI (JS) and backend (Java).  Parsing a DSL is much harder than
> using
> > > YAML
> > > > > for this "reasonable" representation (as in we can reason _about_
> it
> > > :) ),
> > > > > because we already have good backend processing, persistence,
> > > > > serialization; and frontend processing and visualization support
> for
> > > > > YAML-based models.  So I think we almost definitely want a
> > > well-described
> > > > > declarative YAML model of the workflow.
> > > > >
> > > > > We might *also* want a Workflow DSL because I agree with you a DSL
> > > would
> > > > > be nicer for a user to write (if writing by hand; although if
> composing
> > > > > visually a drag-and-drop to YAML is probably easier).  However it
> > > should
> > > > > probably get "compiled" into a Workflow YAML.  So I'm suggesting
> we do
> > > the
> > > > > workflow YAML at this stage, and a DSL that compiles into that YAML
> > > can be
> > > > > designed later.  (Designing a good DSL and parser and
> reason-about-able
> > > > > representation is a big task, so being able to separate it feels
> good
> > > too!)
> > > > >
> > > > > Best
> > > > > Alex
> > > > >
> > > > > [1]
> > > > >
> > >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.75wm48pjvx0h
> > > > >
> > > > >
> > > > > On Fri, 2 Sept 2022 at 20:17, Geoff Macartney <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> Hi Peter,
> > > > >>
> > > > >> Thanks for such a detailed writeup of how you see this working. I
> fear
> > > > >> I've too little experience with this sort of thing to be able to
> say
> > > > >> anything very useful about it. My thought on the matter would be,
> > > > >> let's get started with the yaml based approach and see how it
> goes. I
> > > > >> think that experience would then give us a much better feel for
> what a
> > > > >> really nice and usable DSL for workflows would look like
> (probably to
> > > > >> address all the pain points of the yaml approach! :-)   The
> outline
> > > > >> above will then be a good starting point, I'm sure.
> > > > >>
> > > > >> Cheers
> > > > >> Geoff
> > > > >>
> > > > >> On Thu, 1 Sept 2022 at 21:26, Peter Abramowitsch
> > > > >> <[email protected]> wrote:
> > > > >> >
> > > > >> > Hi All
> > > > >> > I just wanted to clarify something in my comment the other day
> about
> > > > >> DSLs
> > > > >> > since I see that the acronym was also used in Alex's original
> > > document.
> > > > >> > Unless I misunderstood, Alex was proposing to create a DSL for
> > > Brooklyn
> > > > >> > using yaml as syntax and writing a code layer to translate
> between
> > > that
> > > > >> > syntax and underlying APIs which are presumably all in Java.
> > > > >> >
> > > > >> > What I was suggesting was a DSL written directly in  Java (I
> guess)
> > > > >> whose
> > > > >> > syntax would be that language, but whose grammar would be
> keywords
> > > that
> > > > >> > were also Java functions.  Some of these functions would be
> > > pre-defined
> > > > >> in
> > > > >> > the DSL, while others could be  defined by the user and could
> use
> > > other
> > > > >> > functions of the DSL.    The result would be turned into a JAR
> file
> > > (or
> > > > >> > equivalent in another platform)   But during the compile phase,
> it
> > > > >> would be
> > > > >> > checked for errors, and it could be debugged line by line either
> > > > >> invoking
> > > > >> > live functionality or using a library of mock versions of the
> > > Brooklyn
> > > > >> API.
> > > > >> >
> > > > >> > In this 'native' DSL one could provide different types of
> workflow
> > > > >> > constructs as functions (In the BaseClass), taking function
> names as
> > > > >> method
> > > > >> > pointers, or using Lambdas.  It would be a lot easier in Ruby or
> > > Python
> > > > >> >
> > > > >> > // linear
> > > > >> > brooklynRun(NamedTaskMethod, NamedTaskMethod)
> > > > >> >
> > > > >> > // chained
> > > > >> > TaskMethodA()TaskMethodB().
> > > > >> >
> > > > >> > // asynchronous
> > > > >> > brooklynJoin(NamedTaskMethod, NamedTaskMethod,...)
> > > > >> >
> > > > >> > // conditional
> > > > >> > brooklynRunIf(NamedTaskMethod, NamedConditionMethod,...)
> > > > >> >
> > > > >> > // iterative
> > > > >> > brooklynRunWhile(NamedTaskMethod, NamedConditionMethod,...)
> > > > >> > brooklynRunUntil(NamedTaskMethod, NamedConditionMethod,...)
> > > > >> >
> > > > >> > // there could even be a utility to implement legacy syntax
> (this of
> > > > >> course
> > > > >> > would require the extra code layer I was trying to avoid)
> > > > >> > runYaml(Path)
> > > > >> >
> > > > >> > A basic class structure might be
> > > > >> >
> > > > >> > // where BrooklynRecipeBase implements the utility functions
> > > including,
> > > > >> > among others  Join, Run, If, While, Until mentioned above
> > > > >> > // and the BrooklynWorkflowInterface would dictate the
> functional
> > > > >> > requirements for the mandatory aspects of the Recipe.
> > > > >> > class MyRecipe extends BrooklynRecipeBase implements,
> > > > >> > BrooklynWorkflowInterface
> > > > >> > {
> > > > >> > Initialize()
> > > > >> > createContext()   - spin up resources
> > > > >> > workflow() - the main launch sequence using aspects of the DSL
> > > > >> > monitoring() - an asynchronous workflow used to manage sensor
> > > output or
> > > > >> for
> > > > >> > whatever needs to be done while the "orchestra" is plating
> > > > >> > shutdownHook() - called whenever shutdown is happening
> > > > >> > }
> > > > >> >
> > > > >> > For those who don't like the smell of Java, the source file
> could
> > > just
> > > > >> be
> > > > >> > the contents, which would then be injected into the class
> framing
> > > code
> > > > >> > before compilation.
> > > > >> >
> > > > >> > These are just ideas.  I'm not familiar enough with Brooklyn in
> its
> > > > >> current
> > > > >> > implementation to be able to create realistic pseudocode.
> > > > >> >
> > > > >> > Peter
> > > > >> >
> > > > >> > On Thu, Sep 1, 2022 at 9:24 AM Geoff Macartney <
> > > > >> [email protected]>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Alex,
> > > > >> > >
> > > > >> > > That's great, I'll be excited to hear all about it.  7th
> September
> > > > >> > > suits me fine; I would probably prefer 4.00 p.m. over 11.00.
> > > > >> > >
> > > > >> > > Cheers
> > > > >> > > Geoff
> > > > >> > >
> > > > >> > > On Thu, 1 Sept 2022 at 12:41, Alex Heneveld <
> [email protected]>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > Thanks for the excellent feedback Geoff and yes there are
> some
> > > very
> > > > >> cool
> > > > >> > > and exciting things added recently -- containers, conditions,
> and
> > > > >> terraform
> > > > >> > > and kubernetes support, all of which make writing complex
> > > blueprints
> > > > >> much
> > > > >> > > easier.
> > > > >> > > >
> > > > >> > > > I'd love to host a session to showcase these.
> > > > >> > > >
> > > > >> > > > How does Wed 7 Sept sound?  I could do 11am UK or 4pm UK --
> > > > >> depending
> > > > >> > > what time suits for people who are interested.  Please RSVP
> and
> > > > >> indicate
> > > > >> > > your time preference!
> > > > >> > > >
> > > > >> > > > Best
> > > > >> > > > Alex
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, 31 Aug 2022 at 22:17, Geoff Macartney <
> > > > >> [email protected]>
> > > > >> > > wrote:
> > > > >> > > >>
> > > > >> > > >> Hi Alex,
> > > > >> > > >>
> > > > >> > > >> Another thought occurred to me when reading that workflow
> > > > >> proposal. You
> > > > >> > > wrote
> > > > >> > > >>
> > > > >> > > >> "and with the recent support for container-based tasks and
> > > > >> declarative
> > > > >> > > >> conditions, we have taken big steps towards enabling YAML
> > > > >> authorship"
> > > > >> > > >>
> > > > >> > > >> Unfortunately over the past while I haven't been able to
> keep
> > > up as
> > > > >> > > >> closely as I would like with developments in Brooklyn. I'm
> just
> > > > >> > > >> wondering if it might be possible to get together some
> time, on
> > > > >> Google
> > > > >> > > >> Meet or Zoom or whatnot, if you or a colleague could spare
> > > half an
> > > > >> > > >> hour to demo some of these recent developments? But don't
> worry
> > > > >> about
> > > > >> > > >> it if you're too busy at present.
> > > > >> > > >>
> > > > >> > > >> Adding dev@ to this in CC for the sake of Openness. Others
> > > might
> > > > >> also
> > > > >> > > >> be interested!
> > > > >> > > >>
> > > > >> > > >> Cheers
> > > > >> > > >> Geoff
> > > > >> > >
> > > > >>
> > > > >
> > >
>

Re: Declarative Workflow update & shorthand/DSL

Reply via email to