Hi Alex, +1 This updated proposal looks good - I do think the list based approach will be simpler and less error prone, and the fact that you will support an optional `id` anyway, if that is desired, means it retains much of the flexibility of the map based approach. The custom workflow step looks a little like the "functions" that we discussed previously. Putting this all together will be pretty powerful.
Will try to get a look at the latest PR if I can. Cheers Geoff On Mon, 19 Sept 2022 at 17:31, Alex Heneveld <a...@cloudsoft.io> wrote: > > Geoff- Thanks. Comments addressed in #1361 along with a major addition to > support variables -- inputs/outputs/etc. > > All- One of the points Geoff makes concerns how steps are defined. I > think along with other comments that tips the balance in favour of > revisiting how steps are defined. > > I propose we switch from the OLD proposed approach -- the map of ordered > IDs -- to a NEW LIST-BASED approach. There's a lot of detail below but > in-short it's shifting from: > > steps: > 1-say-hi: log hi > 2-step-two: log step 2 > > To: > > steps: > - log hi > - log step 2 > > > Specifically, based on feedback and more hands-on experience, I propose: > > * steps now be supplied as a list (now a map) > * users are no longer required to supply an ID for each step (in the old > approach, the ID was required as the key for every step) > * users can if they wish supply an ID for any step (now as an explicit `id: > <ID>` rule) > * the default order, if no `next: <ID>` instruction is supplied, is the > order of the list (in the old approach the order was based on the ID) > > Also, the shorthand idea has evolved a little bit; instead of a "<type>: > <type-specific-shorthand-template>" single-key map, we've suggested: > > * it be a string "<type> <type-specific-shorthand-template>" > * shorthand can also be supplied in a map using the key "s" or the key > "shorthand" (to allow shorthand along with other step key values) > * custom steps can define custom shorthand templates (e.g. "${key} "=" > ${value}") > * (there is also some evolution in how custom steps are defined) > > > To illustrate: > > The OLD EXAMPLE: > > steps: > 1: > type: container > image: my/google-cloud > command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET} > env: > BUCKET: $brooklyn:config("bucket") > on-error: retry > 2: > set-sensor: spark-output=${1.stdout} > > Would become in the NEW proposal: > > steps: > - type: container > image: my/google-cloud > command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET} > env: > BUCKET: $brooklyn:config("bucket") > on-error: retry > - set-sensor spark-output = ${1.stdout} > > If we wanted to attach an `id` to the second step (e.g. for use with > "next") we could write it either as: > > # full long-hand map > - type: set-sensor > input: > sensor: spark-output > value: ${1.stdout} > id: set-spark-output > > # mixed "s" shorthand key and other fields > - s: set-sensor spark-output = ${1.stdout} > id: set-spark-output > > To explain the reasoning: > > The advantages of steps: > > * Slightly less verbose when no ID is needed on a step > * Easier to read and understand flow > * Avoids hassle of renumbering when introducing step > * Avoids risk of error where same key defined multiple time > > The advantages of OLD map-based scheme (implied disadvantages of the new > steps process): > > * Easier user-facing correlation on steps (e.g. in UI) by always having an > explicit ID for easier correlation > * Easier to extend a workflow by inserting or overriding explicit steps > > After some initial usage of the workflow, it seems these advantages of the > old approach are outweighed by the advantages of the list approach. In > particular the "correlation" can be done in other ways, and extending a > workflow is probably not so useful, whereas supplying and maintaining an ID > is a hassle, error-prone, and harder to understand. > > Finally to explain the custom steps idea, it works out nicely in the code > and we think for users to add a "compound-step" to the catalog e.g. as > follows for the workflow shown above: > > id: retryable-gcloud-dataproc-with-bucket-and-sensor > item: > type: custom-workflow-step > parameters: > bucket: > type: string > sensor_name: > type: string > default: spark-output > shorthand_definition: [ " bucket " ${bucket} ] [ " sensor " > ${sensor_name} ] > steps: > - type: container > image: my/google-cloud > command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET} > env: > BUCKET: ${bucket} > on-error: retry > - set-sensor ${sensor_name} = ${1.stdout} > > A user could then write a step: > > - retryable-gcloud-dataproc-with-bucket-and-sensor > > And optionally use the shorthand per the shorthand_definition, matching the > quoted string literals and inferring the indicated parameters, e.g.: > > - retryable-gcloud-dataproc-with-bucket-and-sensor bucket my-bucket sensor > my-spark-output > > They could of course also use the longhand: > > - type: retryable-gcloud-dataproc-with-bucket-and-sensor > input: > bucket: my-bucket > sensor_name: my-spark-output > > > Best > Alex > > > > On Sat, 17 Sept 2022 at 21:13, Geoff Macartney <geom...@apache.org> wrote: > > > Hi Alex, > > > > Belatedly reviewed the PR. It's looking good! And surprisingly simple > > in the end. Made a couple of minor comments on it. > > > > Cheers > > Geoff > > > > On Thu, 8 Sept 2022 at 09:35, Alex Heneveld <a...@cloudsoft.io> wrote: > > > > > > Hi team, > > > > > > An initial PR with a few types and the ability to define an effector is > > > available [1]. > > > > > > This is enough for the next steps to be parallelized, e.g. new steps > > > added. The proposal has been updated with a work plan / list of tasks > > > [2]. Any volunteers to help with some of the upcoming tasks let me know. > > > > > > Finally I've been thinking about the "shorthand syntax" and how to bring > > us > > > closer to Peter's proposal of a DSL. The original proposal allowed > > instead > > > of a map e.g. > > > > > > step_sleep: > > > type: sleep > > > duration: 5s > > > > > > or > > > > > > step_update_service_up: > > > type: set-sensor > > > sensor: > > > name: service.isUp > > > type: boolean > > > value: true > > > > > > being able to use a shorthand _map_ with a single key being the type, and > > > value interpreted by that type, so in the OLD SHORTHAND PROPOSAL the > > above > > > could be written: > > > > > > step_sleep: > > > sleep: 5s > > > > > > step_update_service_up: > > > set-sensor: service.isUp = true > > > > > > Having played with syntaxes a bit I wonder if we should instead say the > > > shorthand DSL kicks in when the step _body_ is a string (instead of a > > > single-key map), and the first word of the string being the type, and the > > > remainder interpreted by the type, and we allow it to be a bit more > > > ambitious. > > > > > > Concretely this NEW SHORTHAND PROPOSAL would look something like: > > > > > > step_sleep: sleep 5s > > > step_update_service_up: set-sensor service.isUp = true > > > # also supporting a type, ie `set-sensor [TYPE] NAME = VALUE`, eg > > > step_update_service_up: set-sensor boolean service.isUp = true > > > > > > You would still need the full map syntax whenever defining flow logic -- > > eg > > > condition, next, retry, or timeout -- or any property not supported by > > the > > > shorthand syntax. But for the (majority?) simple cases the expression > > > would be very concise. In most cases I think it would feel like a DSL > > but > > > has the virtue of a very clear translation to the actual workflow model > > and > > > the underlying (YAML) model needed for resumption and UI. > > > > > > As a final example, the example used at the start of the proposal > > > (simplified a little -- removing on-error retry and env map as those > > > wouldn't be supported by shorthand): > > > > > > brooklyn.initializers: > > > - type: workflow-effector > > > name: run-spark-on-gcp > > > steps: > > > 1: > > > type: container > > > image: my/google-cloud > > > command: gcloud dataproc jobs submit spark > > > --BUCKET=gs://$brooklyn:config("bucket") > > > 2: > > > type: set-sensor > > > sensor: spark-output > > > value: ${1.stdout} > > > > > > Could be written in this shorthand as follows: > > > > > > steps: > > > 1: container my/google-cloud command "gcloud dataproc jobs submit > > spark > > > --BUCKET=gs://${entity.config.bucket}" > > > 2: set-sensor spark-output ${1.stdout} > > > > > > Thoughts? > > > > > > Best > > > Alex > > > > > > > > > [1] https://github.com/apache/brooklyn-server/pull/1358 > > > [2] > > > > > https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.gbadaqa2yql6 > > > > > > > > > On Wed, 7 Sept 2022 at 09:58, Alex Heneveld <a...@cloudsoft.io> wrote: > > > > > > > Hi Peter, > > > > > > > > Yes - thanks for the extra details. I did take your suggestion to be a > > > > procedural DSL not YAML, per the illustration at [1] (second code > > block). > > > > Probably where I was confusing was in saying that unlike DSLs which > > just > > > > run (and where the execution can be delegated to eg java/groovy/ruby), > > here > > > > we need to understand and display, store and resume the workflow > > progress. > > > > So I think it needs to be compiled to some representation that is well > > > > described and that new Apache Brooklyn code can reason about, both in > > the > > > > UI (JS) and backend (Java). Parsing a DSL is much harder than using > > YAML > > > > for this "reasonable" representation (as in we can reason _about_ it > > :) ), > > > > because we already have good backend processing, persistence, > > > > serialization; and frontend processing and visualization support for > > > > YAML-based models. So I think we almost definitely want a > > well-described > > > > declarative YAML model of the workflow. > > > > > > > > We might *also* want a Workflow DSL because I agree with you a DSL > > would > > > > be nicer for a user to write (if writing by hand; although if composing > > > > visually a drag-and-drop to YAML is probably easier). However it > > should > > > > probably get "compiled" into a Workflow YAML. So I'm suggesting we do > > the > > > > workflow YAML at this stage, and a DSL that compiles into that YAML > > can be > > > > designed later. (Designing a good DSL and parser and reason-about-able > > > > representation is a big task, so being able to separate it feels good > > too!) > > > > > > > > Best > > > > Alex > > > > > > > > [1] > > > > > > https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit#heading=h.75wm48pjvx0h > > > > > > > > > > > > On Fri, 2 Sept 2022 at 20:17, Geoff Macartney < > > geoff.macart...@gmail.com> > > > > wrote: > > > > > > > >> Hi Peter, > > > >> > > > >> Thanks for such a detailed writeup of how you see this working. I fear > > > >> I've too little experience with this sort of thing to be able to say > > > >> anything very useful about it. My thought on the matter would be, > > > >> let's get started with the yaml based approach and see how it goes. I > > > >> think that experience would then give us a much better feel for what a > > > >> really nice and usable DSL for workflows would look like (probably to > > > >> address all the pain points of the yaml approach! :-) The outline > > > >> above will then be a good starting point, I'm sure. > > > >> > > > >> Cheers > > > >> Geoff > > > >> > > > >> On Thu, 1 Sept 2022 at 21:26, Peter Abramowitsch > > > >> <pabramowit...@gmail.com> wrote: > > > >> > > > > >> > Hi All > > > >> > I just wanted to clarify something in my comment the other day about > > > >> DSLs > > > >> > since I see that the acronym was also used in Alex's original > > document. > > > >> > Unless I misunderstood, Alex was proposing to create a DSL for > > Brooklyn > > > >> > using yaml as syntax and writing a code layer to translate between > > that > > > >> > syntax and underlying APIs which are presumably all in Java. > > > >> > > > > >> > What I was suggesting was a DSL written directly in Java (I guess) > > > >> whose > > > >> > syntax would be that language, but whose grammar would be keywords > > that > > > >> > were also Java functions. Some of these functions would be > > pre-defined > > > >> in > > > >> > the DSL, while others could be defined by the user and could use > > other > > > >> > functions of the DSL. The result would be turned into a JAR file > > (or > > > >> > equivalent in another platform) But during the compile phase, it > > > >> would be > > > >> > checked for errors, and it could be debugged line by line either > > > >> invoking > > > >> > live functionality or using a library of mock versions of the > > Brooklyn > > > >> API. > > > >> > > > > >> > In this 'native' DSL one could provide different types of workflow > > > >> > constructs as functions (In the BaseClass), taking function names as > > > >> method > > > >> > pointers, or using Lambdas. It would be a lot easier in Ruby or > > Python > > > >> > > > > >> > // linear > > > >> > brooklynRun(NamedTaskMethod, NamedTaskMethod) > > > >> > > > > >> > // chained > > > >> > TaskMethodA()TaskMethodB(). > > > >> > > > > >> > // asynchronous > > > >> > brooklynJoin(NamedTaskMethod, NamedTaskMethod,...) > > > >> > > > > >> > // conditional > > > >> > brooklynRunIf(NamedTaskMethod, NamedConditionMethod,...) > > > >> > > > > >> > // iterative > > > >> > brooklynRunWhile(NamedTaskMethod, NamedConditionMethod,...) > > > >> > brooklynRunUntil(NamedTaskMethod, NamedConditionMethod,...) > > > >> > > > > >> > // there could even be a utility to implement legacy syntax (this of > > > >> course > > > >> > would require the extra code layer I was trying to avoid) > > > >> > runYaml(Path) > > > >> > > > > >> > A basic class structure might be > > > >> > > > > >> > // where BrooklynRecipeBase implements the utility functions > > including, > > > >> > among others Join, Run, If, While, Until mentioned above > > > >> > // and the BrooklynWorkflowInterface would dictate the functional > > > >> > requirements for the mandatory aspects of the Recipe. > > > >> > class MyRecipe extends BrooklynRecipeBase implements, > > > >> > BrooklynWorkflowInterface > > > >> > { > > > >> > Initialize() > > > >> > createContext() - spin up resources > > > >> > workflow() - the main launch sequence using aspects of the DSL > > > >> > monitoring() - an asynchronous workflow used to manage sensor > > output or > > > >> for > > > >> > whatever needs to be done while the "orchestra" is plating > > > >> > shutdownHook() - called whenever shutdown is happening > > > >> > } > > > >> > > > > >> > For those who don't like the smell of Java, the source file could > > just > > > >> be > > > >> > the contents, which would then be injected into the class framing > > code > > > >> > before compilation. > > > >> > > > > >> > These are just ideas. I'm not familiar enough with Brooklyn in its > > > >> current > > > >> > implementation to be able to create realistic pseudocode. > > > >> > > > > >> > Peter > > > >> > > > > >> > On Thu, Sep 1, 2022 at 9:24 AM Geoff Macartney < > > > >> geoff.macart...@gmail.com> > > > >> > wrote: > > > >> > > > > >> > > Hi Alex, > > > >> > > > > > >> > > That's great, I'll be excited to hear all about it. 7th September > > > >> > > suits me fine; I would probably prefer 4.00 p.m. over 11.00. > > > >> > > > > > >> > > Cheers > > > >> > > Geoff > > > >> > > > > > >> > > On Thu, 1 Sept 2022 at 12:41, Alex Heneveld <a...@cloudsoft.io> > > > >> wrote: > > > >> > > > > > > >> > > > Thanks for the excellent feedback Geoff and yes there are some > > very > > > >> cool > > > >> > > and exciting things added recently -- containers, conditions, and > > > >> terraform > > > >> > > and kubernetes support, all of which make writing complex > > blueprints > > > >> much > > > >> > > easier. > > > >> > > > > > > >> > > > I'd love to host a session to showcase these. > > > >> > > > > > > >> > > > How does Wed 7 Sept sound? I could do 11am UK or 4pm UK -- > > > >> depending > > > >> > > what time suits for people who are interested. Please RSVP and > > > >> indicate > > > >> > > your time preference! > > > >> > > > > > > >> > > > Best > > > >> > > > Alex > > > >> > > > > > > >> > > > > > > >> > > > On Wed, 31 Aug 2022 at 22:17, Geoff Macartney < > > > >> geoff.macart...@gmail.com> > > > >> > > wrote: > > > >> > > >> > > > >> > > >> Hi Alex, > > > >> > > >> > > > >> > > >> Another thought occurred to me when reading that workflow > > > >> proposal. You > > > >> > > wrote > > > >> > > >> > > > >> > > >> "and with the recent support for container-based tasks and > > > >> declarative > > > >> > > >> conditions, we have taken big steps towards enabling YAML > > > >> authorship" > > > >> > > >> > > > >> > > >> Unfortunately over the past while I haven't been able to keep > > up as > > > >> > > >> closely as I would like with developments in Brooklyn. I'm just > > > >> > > >> wondering if it might be possible to get together some time, on > > > >> Google > > > >> > > >> Meet or Zoom or whatnot, if you or a colleague could spare > > half an > > > >> > > >> hour to demo some of these recent developments? But don't worry > > > >> about > > > >> > > >> it if you're too busy at present. > > > >> > > >> > > > >> > > >> Adding dev@ to this in CC for the sake of Openness. Others > > might > > > >> also > > > >> > > >> be interested! > > > >> > > >> > > > >> > > >> Cheers > > > >> > > >> Geoff > > > >> > > > > > >> > > > > > >