Re: Brooklyn Feature Proposal - Declarative and Retryable Workflow

Geoff Macartney Tue, 30 Aug 2022 15:22:29 -0700

Hi Alex et al.

Here are some thoughts on the proposal.

Cheers
Geoff

# General thoughts

1. Adding a procedural "sub-language" like this to Brooklyn could give
it a whole new level of capability, which is very exciting.
2. At the same time this capability could add a whole new dimension of
complexity and difficulty, so I think it will be very important to
make sure the implementation and tooling give users lots of support
for traceability and debugging. It's good to see the emphasis put on
this in "The Other Big Idea: Traceability and Recoverability",
hopefully the implementation will also emphasise it.
3. "A graphical workflow visualiser is not planned at this time" is
understandable but I feel probably the more support we can add for
this in the UI the better the chances of it succeeding.
4. Peter's suggestion of a DSL could potentially simplify some of the
considerations below, what do you think of that suggestion?

# About the workflow language

1. I find the document currently confusing in terms of what sort of
language it is describing - is it a workflow, typically expressed as a
directed graph of nodes (tasks) or a sequence of tasks (which would be
rather more like a procedural language)?
2. if it is a workflow then I'd have thought there shouldn't be any
notion of ordering ("Step References, Numero-Alphabetic Ordering and
Extensibility"). Ordering should be defined only by the graph (value
of "Next" field, and the ids) in this case, don't you think?
3. if it is a sequence of tasks then I think the current mechanism of
ids (the map keys) and Next is awkward. `1-2-http-request` and Next
puts me in mind of BASIC, with Next playing the role of GOTO. In this
case I think it has the potential of introducing the same problems as
GOTO, and might better be done without. Rather it might be preferable
to express the workflow as an array of steps (sequencing), with
support for selection (`condition`/`if`), iteration (maybe consider
introducing `while`? see question below about iteration), and
"functions" (independently defined named workflows, either in the
catalog or elsewhere in the blueprint). Ids would be optional, only
required for steps whose results are referenced elsewhere, not as
sequencers or labels for Next:

```yaml
steps:
  - id: sparc-job
    type: container
    image: my/google-cloud
    command: gcloud dataproc jobs submit spark --BUCKET=gs://${BUCKET}
    env:
      BUCKET: $brooklyn:config("bucket")
    on-error: retry
  - set-sensor: spark-output=${sparc-job.stdout}
```

4. This might also work well with a slightly different markup for
`condition` (which might be nicer as `if`?), adding a `then` defining
a (sub) sequence of steps:

```yaml
steps:
  - if:
      target: ${scratch.skip_date}
      not: { equals: true }
      then:
        - ssh: echo today is `DATE`
        - <other commands in here...>
```
we could also add `else`.

5. Can you add some more details about how you see iteration in the
document - there is that section "Multiple targets and looping" but it
doesn't really have examples of the latter. Could we introduce a
`while` construct?

6. How about renaming `set-workflow-variable` to a simpler `let` or
`set`? You don't actually give an example of its use, is it something
like the following? (I'm imagining a convention for a one line
definition for convenience, with variable name followed by keyword
`be` to indicate assignment to what follows):

```yaml
steps:
  - let: my-scratch be false
  - if:
      target: whatever
      equals: something
      then:
        - let: my-scratch be true
```

7. Could you add some more detail about how independent workflows can
be defined, e.g. in the catalog (or as a separate workflow in the
blueprint?) Can these be parameterised, like function definitions? The
use of `parameters` in the example in section "Request/State Unique
Identifiers" isn't clear to me.

8. Would you foresee adding any support for testing all this to the
Brooklyn tests mechanism? Might be valuable.

# About how to organise implementing all this

1. Can we sequence work on this to do the simpler bits first and get
experience with how it works before proceeding to more advanced things
like nested workflows with per-target conditions ("Multiple Targets
and looping"). I would particularly like to hope that the UI side of
things can progress in step with the workflow definitions, rather than
leave all the UI work to the end.
2. (How) could the work for this be spread across the community? I've
no doubt you're raring to go on this but it would be good if it didn't
all fall on your shoulders!

On Mon, 29 Aug 2022 at 23:51, Geoff Macartney <[email protected]> wrote:
>
> Hi Alex,
>
> I've done a first pass on the document, and it's very impressive. Adding a 
> procedural "sub-language" like this to Brooklyn could give it a whole new 
> level of capability, which is very exciting. I have some thoughts on some of 
> the details proposed which I will try to write up this week.
>
> I share the concerns about YAML which I think Peter expressed very well. His 
> suggestion of a DSL instead of YAML is interesting and I think would be worth 
> considering. I also have some reservations about some of the constructs 
> you're proposing (well, at least one of them) and some perhaps relatively 
> minor suggestions for changes in structure. My bigger concern is that adding 
> a new programming language within Blueprints like this could add a whole new 
> dimension of complexity. I'm asking myself, "how would I debug this" when 
> things go wrong. I think that's worth some discussion as much as the details 
> of the language. There are also points where I simply have questions and 
> would like some more detail.
>
> I'll try to get more detailed thoughts written up this week.
>
> Cheers
> Geoff
>
>
>
> On Sat, 27 Aug 2022 at 00:05, Peter Abramowitsch <[email protected]> 
> wrote:
>>
>> Hi Alex,
>> I haven't been involved with the Brooklyn team for a long while so take
>> this suggestion with as little or as much importance as you see at face
>> value.   Your proposal for a richer specification language to guide
>> realtime behavior is much appreciated and I think it is a great idea.
>> You've obviously thought very deeply as to how it could be applied in
>> different areas of a blueprint.
>>
>> My one comment is whether going for a declarative solution, especially one
>> based on YAML is optimal.  Sure Yaml is well known, easy to eyeball, but it
>> has two drawbacks that make me wonder if it is the best platform for your
>> idea.  The first is that it is a format-based language.  Working in large
>> infrastructure projects, small errors can have disastrous consequences, so
>> as little as a missing or extra tab could result in destroying a data
>> resource or bringing down a complex system.   The other, more philosophical
>> comment has to do with the clumsiness of describing procedural concepts in
>> a declarative language.  (anyone have fun with XSL doing anything
>> significant?)
>>
>> So my suggestion would be to look into DSLs instead of Yaml.  Very nice
>> ones can be created with little effort in Ruby Python, JS - and even Java.
>> In addition to having the language's own interpreter check the syntax for
>> you, you get lots of freebies such as being able to do line by line
>> debugging - and of course the obvious advantage that there is no code layer
>> between the DSL and its implementation, whereas with Yaml, someone needs to
>> write the code that converts the grammar into behavior, catch errors etc.
>>
>> What do you think?
>>
>> Peter
>>
>> On Wed, Aug 24, 2022 at 8:44 AM Alex Heneveld <[email protected]> wrote:
>>
>> > Hi folks,
>> >
>> > I'd like Apache Brooklyn to allow more sophisticated workflow to be written
>> > in YAML.
>> >
>> > As many of you know, we have a powerful task framework in java, but only a
>> > very limited subset is currently exposed via YAML.  I think we could
>> > generalize this without a mammoth effort, and get a very nice way for users
>> > to write complex effectors, sensor feeds, etc, directly in YAML.
>> >
>> > At [1] please find details of the proposal.
>> >
>> > This includes the ability to branch and retry on error.  It can also give
>> > us the ability to retry/resume on an Apache Brooklyn server failover.
>> >
>> > Comments welcome!
>> >
>> > Best
>> > Alex
>> >
>> >
>> > [1]
>> >
>> > https://docs.google.com/document/d/1u02Bi6sS8Fkf1s7UzRRMnvLhA477bqcyxGa0nJesqkI/edit?usp=sharing
>> >

Re: Brooklyn Feature Proposal - Declarative and Retryable Workflow

Reply via email to