Re: Declarative Workflow update & shorthand/DSL

Alex Heneveld Fri, 23 Sep 2022 07:08:03 -0700

Thanks Geoff.  I've had informal feedback from others in favour of the list
approach, and in my trials it is working nice.  I will apply the changes in
the docs and PRs.


Two other things:

* Geoff's comment made me wonder about having a "function" step, where
within a workflow one could define a local function.  I'm thinking *not*
for just now, but FYI it would not be too hard to add.

* The semantics of referencing a sensor which isn't yet "ready" is
ambiguous.  Do we wait, or return null, or give an error.  By default it
blocks but this feels wrong.  I think we should (1) add a new `wait` step
which allows waiting for a value, and (2) give an error when used elsewhere
(which is also the behaviour if you reference a non-existent key in a map),
and (3) support as part of local variables (only) i.e. `let` some limited
evaluation, including the `??` nullish operator which forgives an error on
the LHS (and basic maths)

A bit more detail on that last, it would allow:

- let x = ${entity.sensor.DoesNotExist} ?? 0
- let x = x + 1

But it would give an error if you reference it in log or other commands:

- log the value is ${entity.sensor.DoesNotExist}

Best
Alex


On Wed, 21 Sept 2022 at 20:35, Geoff Macartney <[email protected]> wrote:

> Hi Alex, Mykola,
>
> By the way I should mention that I'm very busy in the evenings this week so
> might not get to look at the latest PR for a while. By all means go ahead
> and merge it if Mykola and/or others are happy with it, no need to wait for
> me.
>
> Cheers
> Geoff
>
>
> On Tue, 20 Sept 2022, 22:20 Geoff Macartney, <[email protected]> wrote:
>
> > Hi Alex,
> >
> > +1 This updated proposal looks good - I do think the list based
> > approach will be simpler and less error prone, and the fact that you
> > will support an optional `id` anyway, if that is desired, means it
> > retains much of the flexibility of the map based approach. The custom
> > workflow step looks a little like the "functions" that we discussed
> > previously. Putting this all together will be pretty powerful.
> >
> > Will try to get a look at the latest PR if I can.
> >
> > Cheers
> > Geoff
> >
> >
> > On Mon, 19 Sept 2022 at 17:31, Alex Heneveld <[email protected]> wrote:
> > >
> > > Geoff-  Thanks.  Comments addressed in #1361 along with a major
> addition
> > to
> > > support variables -- inputs/outputs/etc.
> > >
> > > All-  One of the points Geoff makes concerns how steps are defined.  I
> > > think along with other comments that tips the balance in favour of
> > > revisiting how steps are defined.
> > >
> > > I propose we switch from the OLD proposed approach -- the map of
> ordered
> > > IDs -- to a NEW LIST-BASED approach.  There's a lot of detail below but
> > > in-short it's shifting from:
> > >
> > > steps:
> > >   1-say-hi:  log hi
> > >   2-step-two:  log step 2
> > >
> > > To:
> > >
> > > steps:
> > >   - log hi
> > >   - log step 2
> > >
> > >
> > > Specifically, based on feedback and more hands-on experience, I
> propose:
> > >
> > > * steps now be supplied as a list (now a map)
> > > * users are no longer required to supply an ID for each step (in the
> old
> > > approach, the ID was required as the key for every step)
> > > * users can if they wish supply an ID for any step (now as an explicit
> > `id:
> > > <ID>` rule)
> > > * the default order, if no `next: <ID>` instruction is supplied, is the
> > > order of the list (in the old approach the order was based on the ID)
> > >
> > > Also, the shorthand idea has evolved a little bit; instead of a
> "<type>:
> > > <type-specific-shorthand-template>" single-key map, we've suggested:
> > >
> > > * it be a string "<type> <type-specific-shorthand-template>"
> > > * shorthand can also be supplied in a map using the key "s" or the key
> > > "shorthand" (to allow shorthand along with other step key values)
> > > * custom steps can define custom shorthand templates (e.g. "${key} "="
> > > ${value}")
> > > * (there is also some evolution in how custom steps are defined)
> > >
> > >
> > > To illustrate:
> > >
> > > The OLD EXAMPLE:
> > >
> > > steps:
> > >    1:
> > >       type: container
> > >       image: my/google-cloud
> > >       command: gcloud dataproc jobs submit spark
> --BUCKET=gs://${BUCKET}
> > >       env:
> > >         BUCKET: $brooklyn:config("bucket")
> > >       on-error: retry
> > >     2:
> > >       set-sensor: spark-output=${1.stdout}
> > >
> > > Would become in the NEW proposal:
> > >
> > > steps:
> > >     - type: container
> > >       image: my/google-cloud
> > >       command: gcloud dataproc jobs submit spark
> --BUCKET=gs://${BUCKET}
> > >       env:
> > >         BUCKET: $brooklyn:config("bucket")
> > >       on-error: retry
> > >     - set-sensor spark-output = ${1.stdout}
> > >
> > > If we wanted to attach an `id` to the second step (e.g. for use with
> > > "next") we could write it either as:
> > >
> > >     # full long-hand map
> > >     - type: set-sensor
> > >       input:
> > >         sensor: spark-output
> > >         value: ${1.stdout}
> > >       id: set-spark-output
> > >
> > >     # mixed "s" shorthand key and other fields
> > >     - s: set-sensor spark-output = ${1.stdout}
> > >       id: set-spark-output
> > >
> > > To explain the reasoning:
> > >
> > > The advantages of steps:
> > >
> > > * Slightly less verbose when no ID is needed on a step
> > > * Easier to read and understand flow
> > > * Avoids hassle of renumbering when introducing step
> > > * Avoids risk of error where same key defined multiple time
> > >
> > > The advantages of OLD map-based scheme (implied disadvantages of the
> new
> > > steps process):
> > >
> > > * Easier user-facing correlation on steps (e.g. in UI) by always having
> > an
> > > explicit ID for easier correlation
> > > * Easier to extend a workflow by inserting or overriding explicit steps
> > >
> > > After some initial usage of the workflow, it seems these advantages of
> > the
> > > old approach are outweighed by the advantages of the list approach.  In
> > > particular the "correlation" can be done in other ways, and extending a
> > > workflow is probably not so useful, whereas supplying and maintaining
> an
> > ID
> > > is a hassle, error-prone, and harder to understand.
> > >
> > > Finally to explain the custom steps idea, it works out nicely in the
> code
> > > and we think for users to add a "compound-step" to the catalog e.g. as
> > > follows for the workflow shown above:
> > >
> > >   id: retryable-gcloud-dataproc-with-bucket-and-sensor
> > >   item:
> > >     type: custom-workflow-step
> > >     parameters:
> > >       bucket:
> > >         type: string
> > >       sensor_name:
> > >         type: string
> > >         default: spark-output
> > >     shorthand_definition: [ " bucket " ${bucket} ] [ " sensor "
> > > ${sensor_name} ]
> > >     steps:
> > >     - type: container
> > >       image: my/google-cloud
> > >       command: gcloud dataproc jobs submit spark
> --BUCKET=gs://${BUCKET}
> > >       env:
> > >         BUCKET: ${bucket}
> > >       on-error: retry
> > >     - set-sensor ${sensor_name} = ${1.stdout}
> > >
> > > A user could then write a step:
> > >
> > > - retryable-gcloud-dataproc-with-bucket-and-sensor
> > >
> > > And optionally use the shorthand per the shorthand_definition, matching
> > the
> > > quoted string literals and inferring the indicated parameters, e.g.:
> > >
> > > - retryable-gcloud-dataproc-with-bucket-and-sensor bucket my-bucket
> > sensor
> > > my-spark-output
> > >
> > > They could of course also use the longhand:
> > >
> > > - type: retryable-gcloud-dataproc-with-bucket-and-sensor
> > >   input:
> > >     bucket: my-bucket
> > >     sensor_name: my-spark-output
> > >
> > >
> > > Best
> > > Alex
> > >
> > >
> > >
> > > On Sat, 17 Sept 2022 at 21:13, Geoff Macartney <[email protected]>
> > wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > Belatedly reviewed the PR. It's looking good! And surprisingly simple
> > > > in the end. Made a couple of minor comments on it.
> > > >
> > > > Cheers
> > > > Geoff
> > > >
> > > > On Thu, 8 Sept 2022 at 09:35, Alex Heneveld <[email protected]>
> wrote:
> > > > >
> > > > > Hi team,
> > > > >
> > > > > An initial PR with a few types and the ability to define an
> effector
> > is
> > > > > available [1].
> > > > >
> > > > > This is enough for the next steps to be parallelized, e.g. new
> steps
> > > > > added.  The proposal has been updated with a work plan / list of
> > tasks
> > > > > [2].  Any volunteers to help with some of the upcoming tasks let me
> > know.
> > > > >
> > > > > Finally I've been thinking about the "shorthand syntax" and how to
> > bring
> > > > us
> > > > > closer to Peter's proposal of a DSL.  The original proposal allowed
> > > > instead
> > > > > of a map e.g.
> > > > >
> > > > > step_sleep:
> > > > >   type: sleep
> > > > >   duration: 5s
> > > > >
> > > > > or
> > > > >
> > > > > step_update_service_up:
> > > > >   type: set-sensor
> > > > >   sensor:
> > > > >     name: service.isUp
> > > > >     type: boolean
> > > > >   value: true
> > > > >
> > > > > being able to use a shorthand _map_ with a single key being the
> > type, and
> > > > > value interpreted by that type, so in the OLD SHORTHAND PROPOSAL
> the
> > > > above
> > > > > could be written:
> > > > >
> > > > > step_sleep:
> > > > >   sleep: 5s
> > > > >
> > > > > step_update_service_up:
> > > > >   set-sensor: service.isUp = true
> > > > >
> > > > > Having played with syntaxes a bit I wonder if we should instead say
> > the
> > > > > shorthand DSL kicks in when the step _body_ is a string (instead
> of a
> > > > > single-key map), and the first word of the string being the type,
> > and the
> > > > > remainder interpreted by the type, and we allow it to be a bit more
> > > > > ambitious.
> > > > >
> > > > > Concretely this NEW SHORTHAND PROPOSAL would look something like:
> > > > >
> > > > > step_sleep: sleep 5s
> > > > > step_update_service_up: set-sensor service.isUp = true
> > > > > # also supporting a type, ie `set-sensor [TYPE] NAME = VALUE`, eg
> > > > > step_update_service_up: set-sensor boolean service.isUp = true
> > > > >
> > > > > You would still need the full map syntax whenever defining flow
> > logic --
> > > > eg
> > > > > condition, next, retry, or timeout -- or any property not supported
> > by
> > > > the
> > > > > shorthand syntax.  But for the (majority?) simple cases the
> > expression
> > > > > would be very concise.  In most cases I think it would feel like a
> > DSL
> > > > but
> > > > > has the virtue of a very clear translation to the actual workflow
> > model
> > > > and
> > > > > the underlying (YAML) model needed for resumption and UI.
> > > > >
> > > > > As a final example, the example used at the start of the proposal
> > > > > (simplified a little -- removing on-error retry and env map as
> those
> > > > > wouldn't be supported by shorthand):
> > > > >
> > > > > brooklyn.initializers:
> > > > > - type: workflow-effector
> > > > >  name: run-spark-on-gcp
> > > > >  steps:
> > > > >    1:
> > > > >       type: container
> > > > >       image: my/google-cloud
> > > > >       command: gcloud dataproc jobs submit spark
> > > > > --BUCKET=gs://$brooklyn:config("bucket")
> > > > >     2:
> > > > >       type: set-sensor
> > > > >       sensor: spark-output
> > > > >       value: ${1.stdout}
> > > > >
> > > > > Could be written in this shorthand as follows:
> > > > >
> > > > >  steps:
> > > > >    1: container my/google-cloud command "gcloud dataproc jobs
> submit
> > > > spark
> > > > > --BUCKET=gs://${entity.config.bucket}"
> > > > >    2: set-sensor spark-output ${1.stdout}
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best
> > > > > Alex
> > > > >
> > > > >
> > > > > [1] https://github.com/apache/brooklyn-server/pull/1358
> > > > > [2]
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.gbadaqa2yql6
> > > > >
> > > > >
> > > > > On Wed, 7 Sept 2022 at 09:58, Alex Heneveld <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi Peter,
> > > > > >
> > > > > > Yes - thanks for the extra details.  I did take your suggestion
> to
> > be a
> > > > > > procedural DSL not YAML, per the illustration at [1] (second code
> > > > block).
> > > > > > Probably where I was confusing was in saying that unlike DSLs
> which
> > > > just
> > > > > > run (and where the execution can be delegated to eg
> > java/groovy/ruby),
> > > > here
> > > > > > we need to understand and display, store and resume the workflow
> > > > progress.
> > > > > > So I think it needs to be compiled to some representation that is
> > well
> > > > > > described and that new Apache Brooklyn code can reason about,
> both
> > in
> > > > the
> > > > > > UI (JS) and backend (Java).  Parsing a DSL is much harder than
> > using
> > > > YAML
> > > > > > for this "reasonable" representation (as in we can reason _about_
> > it
> > > > :) ),
> > > > > > because we already have good backend processing, persistence,
> > > > > > serialization; and frontend processing and visualization support
> > for
> > > > > > YAML-based models.  So I think we almost definitely want a
> > > > well-described
> > > > > > declarative YAML model of the workflow.
> > > > > >
> > > > > > We might *also* want a Workflow DSL because I agree with you a
> DSL
> > > > would
> > > > > > be nicer for a user to write (if writing by hand; although if
> > composing
> > > > > > visually a drag-and-drop to YAML is probably easier).  However it
> > > > should
> > > > > > probably get "compiled" into a Workflow YAML.  So I'm suggesting
> > we do
> > > > the
> > > > > > workflow YAML at this stage, and a DSL that compiles into that
> YAML
> > > > can be
> > > > > > designed later.  (Designing a good DSL and parser and
> > reason-about-able
> > > > > > representation is a big task, so being able to separate it feels
> > good
> > > > too!)
> > > > > >
> > > > > > Best
> > > > > > Alex
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> >
> https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.75wm48pjvx0h
> > > > > >
> > > > > >
> > > > > > On Fri, 2 Sept 2022 at 20:17, Geoff Macartney <
> > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Peter,
> > > > > >>
> > > > > >> Thanks for such a detailed writeup of how you see this working.
> I
> > fear
> > > > > >> I've too little experience with this sort of thing to be able to
> > say
> > > > > >> anything very useful about it. My thought on the matter would
> be,
> > > > > >> let's get started with the yaml based approach and see how it
> > goes. I
> > > > > >> think that experience would then give us a much better feel for
> > what a
> > > > > >> really nice and usable DSL for workflows would look like
> > (probably to
> > > > > >> address all the pain points of the yaml approach! :-)   The
> > outline
> > > > > >> above will then be a good starting point, I'm sure.
> > > > > >>
> > > > > >> Cheers
> > > > > >> Geoff
> > > > > >>
> > > > > >> On Thu, 1 Sept 2022 at 21:26, Peter Abramowitsch
> > > > > >> <[email protected]> wrote:
> > > > > >> >
> > > > > >> > Hi All
> > > > > >> > I just wanted to clarify something in my comment the other day
> > about
> > > > > >> DSLs
> > > > > >> > since I see that the acronym was also used in Alex's original
> > > > document.
> > > > > >> > Unless I misunderstood, Alex was proposing to create a DSL for
> > > > Brooklyn
> > > > > >> > using yaml as syntax and writing a code layer to translate
> > between
> > > > that
> > > > > >> > syntax and underlying APIs which are presumably all in Java.
> > > > > >> >
> > > > > >> > What I was suggesting was a DSL written directly in  Java (I
> > guess)
> > > > > >> whose
> > > > > >> > syntax would be that language, but whose grammar would be
> > keywords
> > > > that
> > > > > >> > were also Java functions.  Some of these functions would be
> > > > pre-defined
> > > > > >> in
> > > > > >> > the DSL, while others could be  defined by the user and could
> > use
> > > > other
> > > > > >> > functions of the DSL.    The result would be turned into a JAR
> > file
> > > > (or
> > > > > >> > equivalent in another platform)   But during the compile
> phase,
> > it
> > > > > >> would be
> > > > > >> > checked for errors, and it could be debugged line by line
> either
> > > > > >> invoking
> > > > > >> > live functionality or using a library of mock versions of the
> > > > Brooklyn
> > > > > >> API.
> > > > > >> >
> > > > > >> > In this 'native' DSL one could provide different types of
> > workflow
> > > > > >> > constructs as functions (In the BaseClass), taking function
> > names as
> > > > > >> method
> > > > > >> > pointers, or using Lambdas.  It would be a lot easier in Ruby
> or
> > > > Python
> > > > > >> >
> > > > > >> > // linear
> > > > > >> > brooklynRun(NamedTaskMethod, NamedTaskMethod)
> > > > > >> >
> > > > > >> > // chained
> > > > > >> > TaskMethodA()TaskMethodB().
> > > > > >> >
> > > > > >> > // asynchronous
> > > > > >> > brooklynJoin(NamedTaskMethod, NamedTaskMethod,...)
> > > > > >> >
> > > > > >> > // conditional
> > > > > >> > brooklynRunIf(NamedTaskMethod, NamedConditionMethod,...)
> > > > > >> >
> > > > > >> > // iterative
> > > > > >> > brooklynRunWhile(NamedTaskMethod, NamedConditionMethod,...)
> > > > > >> > brooklynRunUntil(NamedTaskMethod, NamedConditionMethod,...)
> > > > > >> >
> > > > > >> > // there could even be a utility to implement legacy syntax
> > (this of
> > > > > >> course
> > > > > >> > would require the extra code layer I was trying to avoid)
> > > > > >> > runYaml(Path)
> > > > > >> >
> > > > > >> > A basic class structure might be
> > > > > >> >
> > > > > >> > // where BrooklynRecipeBase implements the utility functions
> > > > including,
> > > > > >> > among others  Join, Run, If, While, Until mentioned above
> > > > > >> > // and the BrooklynWorkflowInterface would dictate the
> > functional
> > > > > >> > requirements for the mandatory aspects of the Recipe.
> > > > > >> > class MyRecipe extends BrooklynRecipeBase implements,
> > > > > >> > BrooklynWorkflowInterface
> > > > > >> > {
> > > > > >> > Initialize()
> > > > > >> > createContext()   - spin up resources
> > > > > >> > workflow() - the main launch sequence using aspects of the DSL
> > > > > >> > monitoring() - an asynchronous workflow used to manage sensor
> > > > output or
> > > > > >> for
> > > > > >> > whatever needs to be done while the "orchestra" is plating
> > > > > >> > shutdownHook() - called whenever shutdown is happening
> > > > > >> > }
> > > > > >> >
> > > > > >> > For those who don't like the smell of Java, the source file
> > could
> > > > just
> > > > > >> be
> > > > > >> > the contents, which would then be injected into the class
> > framing
> > > > code
> > > > > >> > before compilation.
> > > > > >> >
> > > > > >> > These are just ideas.  I'm not familiar enough with Brooklyn
> in
> > its
> > > > > >> current
> > > > > >> > implementation to be able to create realistic pseudocode.
> > > > > >> >
> > > > > >> > Peter
> > > > > >> >
> > > > > >> > On Thu, Sep 1, 2022 at 9:24 AM Geoff Macartney <
> > > > > >> [email protected]>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hi Alex,
> > > > > >> > >
> > > > > >> > > That's great, I'll be excited to hear all about it.  7th
> > September
> > > > > >> > > suits me fine; I would probably prefer 4.00 p.m. over 11.00.
> > > > > >> > >
> > > > > >> > > Cheers
> > > > > >> > > Geoff
> > > > > >> > >
> > > > > >> > > On Thu, 1 Sept 2022 at 12:41, Alex Heneveld <
> > [email protected]>
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > > Thanks for the excellent feedback Geoff and yes there are
> > some
> > > > very
> > > > > >> cool
> > > > > >> > > and exciting things added recently -- containers,
> conditions,
> > and
> > > > > >> terraform
> > > > > >> > > and kubernetes support, all of which make writing complex
> > > > blueprints
> > > > > >> much
> > > > > >> > > easier.
> > > > > >> > > >
> > > > > >> > > > I'd love to host a session to showcase these.
> > > > > >> > > >
> > > > > >> > > > How does Wed 7 Sept sound?  I could do 11am UK or 4pm UK
> --
> > > > > >> depending
> > > > > >> > > what time suits for people who are interested.  Please RSVP
> > and
> > > > > >> indicate
> > > > > >> > > your time preference!
> > > > > >> > > >
> > > > > >> > > > Best
> > > > > >> > > > Alex
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Wed, 31 Aug 2022 at 22:17, Geoff Macartney <
> > > > > >> [email protected]>
> > > > > >> > > wrote:
> > > > > >> > > >>
> > > > > >> > > >> Hi Alex,
> > > > > >> > > >>
> > > > > >> > > >> Another thought occurred to me when reading that workflow
> > > > > >> proposal. You
> > > > > >> > > wrote
> > > > > >> > > >>
> > > > > >> > > >> "and with the recent support for container-based tasks
> and
> > > > > >> declarative
> > > > > >> > > >> conditions, we have taken big steps towards enabling YAML
> > > > > >> authorship"
> > > > > >> > > >>
> > > > > >> > > >> Unfortunately over the past while I haven't been able to
> > keep
> > > > up as
> > > > > >> > > >> closely as I would like with developments in Brooklyn.
> I'm
> > just
> > > > > >> > > >> wondering if it might be possible to get together some
> > time, on
> > > > > >> Google
> > > > > >> > > >> Meet or Zoom or whatnot, if you or a colleague could
> spare
> > > > half an
> > > > > >> > > >> hour to demo some of these recent developments? But don't
> > worry
> > > > > >> about
> > > > > >> > > >> it if you're too busy at present.
> > > > > >> > > >>
> > > > > >> > > >> Adding dev@ to this in CC for the sake of Openness.
> Others
> > > > might
> > > > > >> also
> > > > > >> > > >> be interested!
> > > > > >> > > >>
> > > > > >> > > >> Cheers
> > > > > >> > > >> Geoff
> > > > > >> > >
> > > > > >>
> > > > > >
> > > >
> >
>

Re: Declarative Workflow update & shorthand/DSL

Reply via email to