Re: Declarative Workflow update & shorthand/DSL

Geoff Macartney Tue, 20 Sep 2022 14:21:05 -0700

Hi Alex,

+1 This updated proposal looks good - I do think the list based
approach will be simpler and less error prone, and the fact that you
will support an optional `id` anyway, if that is desired, means it
retains much of the flexibility of the map based approach. The custom
workflow step looks a little like the "functions" that we discussed
previously. Putting this all together will be pretty powerful.


Will try to get a look at the latest PR if I can.

Cheers
Geoff


On Mon, 19 Sept 2022 at 17:31, Alex Heneveld <[email protected]> wrote:
>
> Geoff-  Thanks.  Comments addressed in #1361 along with a major addition to
> support variables -- inputs/outputs/etc.
>
> All-  One of the points Geoff makes concerns how steps are defined.  I
> think along with other comments that tips the balance in favour of
> revisiting how steps are defined.
>
> I propose we switch from the OLD proposed approach -- the map of ordered
> IDs -- to a NEW LIST-BASED approach.  There's a lot of detail below but
> in-short it's shifting from:
>
> steps:
>   1-say-hi:  log hi
>   2-step-two:  log step 2
>
> To:
>
> steps:
>   - log hi
>   - log step 2
>
>
> Specifically, based on feedback and more hands-on experience, I propose:
>
> * steps now be supplied as a list (now a map)
> * users are no longer required to supply an ID for each step (in the old
> approach, the ID was required as the key for every step)
> * users can if they wish supply an ID for any step (now as an explicit `id:
> <ID>` rule)
> * the default order, if no `next: <ID>` instruction is supplied, is the
> order of the list (in the old approach the order was based on the ID)
>
> Also, the shorthand idea has evolved a little bit; instead of a "<type>:
> <type-specific-shorthand-template>" single-key map, we've suggested:
>
> * it be a string "<type> <type-specific-shorthand-template>"
> * shorthand can also be supplied in a map using the key "s" or the key
> "shorthand" (to allow shorthand along with other step key values)
> * custom steps can define custom shorthand templates (e.g. "${key} "="
> ${value}")
> * (there is also some evolution in how custom steps are defined)
>
>
> To illustrate:
>
> The OLD EXAMPLE:
>
> steps:
>    1:
>       type: container
>       image: my/google-cloud
>       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
>       env:
>         BUCKET: $brooklyn:config("bucket")
>       on-error: retry
>     2:
>       set-sensor: spark-output=${1.stdout}
>
> Would become in the NEW proposal:
>
> steps:
>     - type: container
>       image: my/google-cloud
>       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
>       env:
>         BUCKET: $brooklyn:config("bucket")
>       on-error: retry
>     - set-sensor spark-output = ${1.stdout}
>
> If we wanted to attach an `id` to the second step (e.g. for use with
> "next") we could write it either as:
>
>     # full long-hand map
>     - type: set-sensor
>       input:
>         sensor: spark-output
>         value: ${1.stdout}
>       id: set-spark-output
>
>     # mixed "s" shorthand key and other fields
>     - s: set-sensor spark-output = ${1.stdout}
>       id: set-spark-output
>
> To explain the reasoning:
>
> The advantages of steps:
>
> * Slightly less verbose when no ID is needed on a step
> * Easier to read and understand flow
> * Avoids hassle of renumbering when introducing step
> * Avoids risk of error where same key defined multiple time
>
> The advantages of OLD map-based scheme (implied disadvantages of the new
> steps process):
>
> * Easier user-facing correlation on steps (e.g. in UI) by always having an
> explicit ID for easier correlation
> * Easier to extend a workflow by inserting or overriding explicit steps
>
> After some initial usage of the workflow, it seems these advantages of the
> old approach are outweighed by the advantages of the list approach.  In
> particular the "correlation" can be done in other ways, and extending a
> workflow is probably not so useful, whereas supplying and maintaining an ID
> is a hassle, error-prone, and harder to understand.
>
> Finally to explain the custom steps idea, it works out nicely in the code
> and we think for users to add a "compound-step" to the catalog e.g. as
> follows for the workflow shown above:
>
>   id: retryable-gcloud-dataproc-with-bucket-and-sensor
>   item:
>     type: custom-workflow-step
>     parameters:
>       bucket:
>         type: string
>       sensor_name:
>         type: string
>         default: spark-output
>     shorthand_definition: [ " bucket " ${bucket} ] [ " sensor "
> ${sensor_name} ]
>     steps:
>     - type: container
>       image: my/google-cloud
>       command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
>       env:
>         BUCKET: ${bucket}
>       on-error: retry
>     - set-sensor ${sensor_name} = ${1.stdout}
>
> A user could then write a step:
>
> - retryable-gcloud-dataproc-with-bucket-and-sensor
>
> And optionally use the shorthand per the shorthand_definition, matching the
> quoted string literals and inferring the indicated parameters, e.g.:
>
> - retryable-gcloud-dataproc-with-bucket-and-sensor bucket my-bucket sensor
> my-spark-output
>
> They could of course also use the longhand:
>
> - type: retryable-gcloud-dataproc-with-bucket-and-sensor
>   input:
>     bucket: my-bucket
>     sensor_name: my-spark-output
>
>
> Best
> Alex
>
>
>
> On Sat, 17 Sept 2022 at 21:13, Geoff Macartney <[email protected]> wrote:
>
> > Hi Alex,
> >
> > Belatedly reviewed the PR. It's looking good! And surprisingly simple
> > in the end. Made a couple of minor comments on it.
> >
> > Cheers
> > Geoff
> >
> > On Thu, 8 Sept 2022 at 09:35, Alex Heneveld <[email protected]> wrote:
> > >
> > > Hi team,
> > >
> > > An initial PR with a few types and the ability to define an effector is
> > > available [1].
> > >
> > > This is enough for the next steps to be parallelized, e.g. new steps
> > > added.  The proposal has been updated with a work plan / list of tasks
> > > [2].  Any volunteers to help with some of the upcoming tasks let me know.
> > >
> > > Finally I've been thinking about the "shorthand syntax" and how to bring
> > us
> > > closer to Peter's proposal of a DSL.  The original proposal allowed
> > instead
> > > of a map e.g.
> > >
> > > step_sleep:
> > >   type: sleep
> > >   duration: 5s
> > >
> > > or
> > >
> > > step_update_service_up:
> > >   type: set-sensor
> > >   sensor:
> > >     name: service.isUp
> > >     type: boolean
> > >   value: true
> > >
> > > being able to use a shorthand _map_ with a single key being the type, and
> > > value interpreted by that type, so in the OLD SHORTHAND PROPOSAL the
> > above
> > > could be written:
> > >
> > > step_sleep:
> > >   sleep: 5s
> > >
> > > step_update_service_up:
> > >   set-sensor: service.isUp = true
> > >
> > > Having played with syntaxes a bit I wonder if we should instead say the
> > > shorthand DSL kicks in when the step _body_ is a string (instead of a
> > > single-key map), and the first word of the string being the type, and the
> > > remainder interpreted by the type, and we allow it to be a bit more
> > > ambitious.
> > >
> > > Concretely this NEW SHORTHAND PROPOSAL would look something like:
> > >
> > > step_sleep: sleep 5s
> > > step_update_service_up: set-sensor service.isUp = true
> > > # also supporting a type, ie `set-sensor [TYPE] NAME = VALUE`, eg
> > > step_update_service_up: set-sensor boolean service.isUp = true
> > >
> > > You would still need the full map syntax whenever defining flow logic --
> > eg
> > > condition, next, retry, or timeout -- or any property not supported by
> > the
> > > shorthand syntax.  But for the (majority?) simple cases the expression
> > > would be very concise.  In most cases I think it would feel like a DSL
> > but
> > > has the virtue of a very clear translation to the actual workflow model
> > and
> > > the underlying (YAML) model needed for resumption and UI.
> > >
> > > As a final example, the example used at the start of the proposal
> > > (simplified a little -- removing on-error retry and env map as those
> > > wouldn't be supported by shorthand):
> > >
> > > brooklyn.initializers:
> > > - type: workflow-effector
> > >  name: run-spark-on-gcp
> > >  steps:
> > >    1:
> > >       type: container
> > >       image: my/google-cloud
> > >       command: gcloud dataproc jobs submit spark
> > > --BUCKET=gs://$brooklyn:config("bucket")
> > >     2:
> > >       type: set-sensor
> > >       sensor: spark-output
> > >       value: ${1.stdout}
> > >
> > > Could be written in this shorthand as follows:
> > >
> > >  steps:
> > >    1: container my/google-cloud command "gcloud dataproc jobs submit
> > spark
> > > --BUCKET=gs://${entity.config.bucket}"
> > >    2: set-sensor spark-output ${1.stdout}
> > >
> > > Thoughts?
> > >
> > > Best
> > > Alex
> > >
> > >
> > > [1] https://github.com/apache/brooklyn-server/pull/1358
> > > [2]
> > >
> > https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.gbadaqa2yql6
> > >
> > >
> > > On Wed, 7 Sept 2022 at 09:58, Alex Heneveld <[email protected]> wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > Yes - thanks for the extra details.  I did take your suggestion to be a
> > > > procedural DSL not YAML, per the illustration at [1] (second code
> > block).
> > > > Probably where I was confusing was in saying that unlike DSLs which
> > just
> > > > run (and where the execution can be delegated to eg java/groovy/ruby),
> > here
> > > > we need to understand and display, store and resume the workflow
> > progress.
> > > > So I think it needs to be compiled to some representation that is well
> > > > described and that new Apache Brooklyn code can reason about, both in
> > the
> > > > UI (JS) and backend (Java).  Parsing a DSL is much harder than using
> > YAML
> > > > for this "reasonable" representation (as in we can reason _about_ it
> > :) ),
> > > > because we already have good backend processing, persistence,
> > > > serialization; and frontend processing and visualization support for
> > > > YAML-based models.  So I think we almost definitely want a
> > well-described
> > > > declarative YAML model of the workflow.
> > > >
> > > > We might *also* want a Workflow DSL because I agree with you a DSL
> > would
> > > > be nicer for a user to write (if writing by hand; although if composing
> > > > visually a drag-and-drop to YAML is probably easier).  However it
> > should
> > > > probably get "compiled" into a Workflow YAML.  So I'm suggesting we do
> > the
> > > > workflow YAML at this stage, and a DSL that compiles into that YAML
> > can be
> > > > designed later.  (Designing a good DSL and parser and reason-about-able
> > > > representation is a big task, so being able to separate it feels good
> > too!)
> > > >
> > > > Best
> > > > Alex
> > > >
> > > > [1]
> > > >
> > https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.75wm48pjvx0h
> > > >
> > > >
> > > > On Fri, 2 Sept 2022 at 20:17, Geoff Macartney <
> > [email protected]>
> > > > wrote:
> > > >
> > > >> Hi Peter,
> > > >>
> > > >> Thanks for such a detailed writeup of how you see this working. I fear
> > > >> I've too little experience with this sort of thing to be able to say
> > > >> anything very useful about it. My thought on the matter would be,
> > > >> let's get started with the yaml based approach and see how it goes. I
> > > >> think that experience would then give us a much better feel for what a
> > > >> really nice and usable DSL for workflows would look like (probably to
> > > >> address all the pain points of the yaml approach! :-)   The outline
> > > >> above will then be a good starting point, I'm sure.
> > > >>
> > > >> Cheers
> > > >> Geoff
> > > >>
> > > >> On Thu, 1 Sept 2022 at 21:26, Peter Abramowitsch
> > > >> <[email protected]> wrote:
> > > >> >
> > > >> > Hi All
> > > >> > I just wanted to clarify something in my comment the other day about
> > > >> DSLs
> > > >> > since I see that the acronym was also used in Alex's original
> > document.
> > > >> > Unless I misunderstood, Alex was proposing to create a DSL for
> > Brooklyn
> > > >> > using yaml as syntax and writing a code layer to translate between
> > that
> > > >> > syntax and underlying APIs which are presumably all in Java.
> > > >> >
> > > >> > What I was suggesting was a DSL written directly in  Java (I guess)
> > > >> whose
> > > >> > syntax would be that language, but whose grammar would be keywords
> > that
> > > >> > were also Java functions.  Some of these functions would be
> > pre-defined
> > > >> in
> > > >> > the DSL, while others could be  defined by the user and could use
> > other
> > > >> > functions of the DSL.    The result would be turned into a JAR file
> > (or
> > > >> > equivalent in another platform)   But during the compile phase, it
> > > >> would be
> > > >> > checked for errors, and it could be debugged line by line either
> > > >> invoking
> > > >> > live functionality or using a library of mock versions of the
> > Brooklyn
> > > >> API.
> > > >> >
> > > >> > In this 'native' DSL one could provide different types of workflow
> > > >> > constructs as functions (In the BaseClass), taking function names as
> > > >> method
> > > >> > pointers, or using Lambdas.  It would be a lot easier in Ruby or
> > Python
> > > >> >
> > > >> > // linear
> > > >> > brooklynRun(NamedTaskMethod, NamedTaskMethod)
> > > >> >
> > > >> > // chained
> > > >> > TaskMethodA()TaskMethodB().
> > > >> >
> > > >> > // asynchronous
> > > >> > brooklynJoin(NamedTaskMethod, NamedTaskMethod,...)
> > > >> >
> > > >> > // conditional
> > > >> > brooklynRunIf(NamedTaskMethod, NamedConditionMethod,...)
> > > >> >
> > > >> > // iterative
> > > >> > brooklynRunWhile(NamedTaskMethod, NamedConditionMethod,...)
> > > >> > brooklynRunUntil(NamedTaskMethod, NamedConditionMethod,...)
> > > >> >
> > > >> > // there could even be a utility to implement legacy syntax (this of
> > > >> course
> > > >> > would require the extra code layer I was trying to avoid)
> > > >> > runYaml(Path)
> > > >> >
> > > >> > A basic class structure might be
> > > >> >
> > > >> > // where BrooklynRecipeBase implements the utility functions
> > including,
> > > >> > among others  Join, Run, If, While, Until mentioned above
> > > >> > // and the BrooklynWorkflowInterface would dictate the functional
> > > >> > requirements for the mandatory aspects of the Recipe.
> > > >> > class MyRecipe extends BrooklynRecipeBase implements,
> > > >> > BrooklynWorkflowInterface
> > > >> > {
> > > >> > Initialize()
> > > >> > createContext()   - spin up resources
> > > >> > workflow() - the main launch sequence using aspects of the DSL
> > > >> > monitoring() - an asynchronous workflow used to manage sensor
> > output or
> > > >> for
> > > >> > whatever needs to be done while the "orchestra" is plating
> > > >> > shutdownHook() - called whenever shutdown is happening
> > > >> > }
> > > >> >
> > > >> > For those who don't like the smell of Java, the source file could
> > just
> > > >> be
> > > >> > the contents, which would then be injected into the class framing
> > code
> > > >> > before compilation.
> > > >> >
> > > >> > These are just ideas.  I'm not familiar enough with Brooklyn in its
> > > >> current
> > > >> > implementation to be able to create realistic pseudocode.
> > > >> >
> > > >> > Peter
> > > >> >
> > > >> > On Thu, Sep 1, 2022 at 9:24 AM Geoff Macartney <
> > > >> [email protected]>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Alex,
> > > >> > >
> > > >> > > That's great, I'll be excited to hear all about it.  7th September
> > > >> > > suits me fine; I would probably prefer 4.00 p.m. over 11.00.
> > > >> > >
> > > >> > > Cheers
> > > >> > > Geoff
> > > >> > >
> > > >> > > On Thu, 1 Sept 2022 at 12:41, Alex Heneveld <[email protected]>
> > > >> wrote:
> > > >> > > >
> > > >> > > > Thanks for the excellent feedback Geoff and yes there are some
> > very
> > > >> cool
> > > >> > > and exciting things added recently -- containers, conditions, and
> > > >> terraform
> > > >> > > and kubernetes support, all of which make writing complex
> > blueprints
> > > >> much
> > > >> > > easier.
> > > >> > > >
> > > >> > > > I'd love to host a session to showcase these.
> > > >> > > >
> > > >> > > > How does Wed 7 Sept sound?  I could do 11am UK or 4pm UK --
> > > >> depending
> > > >> > > what time suits for people who are interested.  Please RSVP and
> > > >> indicate
> > > >> > > your time preference!
> > > >> > > >
> > > >> > > > Best
> > > >> > > > Alex
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, 31 Aug 2022 at 22:17, Geoff Macartney <
> > > >> [email protected]>
> > > >> > > wrote:
> > > >> > > >>
> > > >> > > >> Hi Alex,
> > > >> > > >>
> > > >> > > >> Another thought occurred to me when reading that workflow
> > > >> proposal. You
> > > >> > > wrote
> > > >> > > >>
> > > >> > > >> "and with the recent support for container-based tasks and
> > > >> declarative
> > > >> > > >> conditions, we have taken big steps towards enabling YAML
> > > >> authorship"
> > > >> > > >>
> > > >> > > >> Unfortunately over the past while I haven't been able to keep
> > up as
> > > >> > > >> closely as I would like with developments in Brooklyn. I'm just
> > > >> > > >> wondering if it might be possible to get together some time, on
> > > >> Google
> > > >> > > >> Meet or Zoom or whatnot, if you or a colleague could spare
> > half an
> > > >> > > >> hour to demo some of these recent developments? But don't worry
> > > >> about
> > > >> > > >> it if you're too busy at present.
> > > >> > > >>
> > > >> > > >> Adding dev@ to this in CC for the sake of Openness. Others
> > might
> > > >> also
> > > >> > > >> be interested!
> > > >> > > >>
> > > >> > > >> Cheers
> > > >> > > >> Geoff
> > > >> > >
> > > >>
> > > >
> >

Re: Declarative Workflow update & shorthand/DSL

Reply via email to