[PROPOSAL] Controlling effectors concurrency

Svetoslav Neykov Wed, 11 Jan 2017 04:08:57 -0800

## Problem

The current model in Brooklyn for executing effectors is to do it in parallel, 
without regard for already running instances of the same effector. This makes 
writing certain classes of YAML blueprints harder - use-cases which need to 
limit the number of concurrent executions. Currently this gets worked around on 
per-blueprint basis, shifting the burden of synchronizing/locking to the 
blueprint which have limited means to do it.


Some concrete examples:
  * A haproxy blueprint which needs to have at most one "update configuration" 
effector running - solved in bash by using flock
    
https://github.com/brooklyncentral/clocker/blob/9d3487198f426e8ebc6efeee94af3dc50383fa71/common/catalog/common/haproxy.bom
  * Some clusters have a limit on how many members can join at a time 
(Cassandra notably)
  * A DNS blueprint needs to make sure that updates to the records happen 
sequentially so no records get lost
  * To avoid API rate limits in certain services we need to limit how many 
operations we do at any moment - say we want to limit provisioning of entities, 
but not installing/launching them.

A first step in solving the above has been made in 
https://github.com/apache/brooklyn-server/pull/443 which adds 
"maxConcurrentChildCommands" to the DynamicCluster operations (start, resize, 
stop). This allows us to limit how many entities get created/destroyed by the 
cluster in parallel. The goal of this proposal is to extend it by making it 
possible to apply finer grained limits (say just on the launch step of the 
start effector) and to make it more general (not just start/stop in cluster but 
any effector).

## Proposed solution

Add functionality which allows external code (e.g. adjuncts) to plug into the 
lifecycle of entities **synchronously** and influence their behaviour. This 
will allow us to influence the execution of effectors on entities  and for this 
particular proposal to block execution until some condition is met.

## Possible approaches (alternatives)

### Effector execution notifications

Provide the functionality to subscribe callbacks to be called when an effector 
is about to execute on an entity. The callback has the ability to mutate the 
effector, for example by adding a wrapper task to ensure certain concurrency 
limits. A simpler alternative would be to add pre and post execution callbacks. 
For this to be useful we need to split big effectors into smaller pieces. For 
example the start effectors will be a composition of provision, install, 
customize, launch effectors.
The reason not to work at the task level is that tasks are anonymous so we 
can't really subscribe to them. To do that we'd need to add identifiers to them 
which essentially turns them into effectors.

### Add hooks to the existing effectors

We could add fixed pre and post hooks to the start/stop effectors which execute 
callbacks synchronously at key points around tasks.

--

Both of the above will allow us to plug additional logic into the lifecycle of 
entities, making it possible to block execution. For clusters we'd plug into 
the members' lifecycle and provide cluster-wide limits (say a semaphore shared 
by the members). For more complex scenarios we could name the synchronising 
entity explicitly, for example to block execution until a step in a separate 
entity is complete (say registering DNS records after provisioning but before 
launch application-wide).

## Examples

Here are some concrete examples which give you a taste of what it would look 
like (thanks Geoff for sharing these)


### Limit the number of entities starting at any moment in the cluster (but 
provision them in parallel)
services:
- type: cluster
  brooklyn.enrichers:
### plugs into the lifecycle provided callbacks and limits how many tasks can 
execute in parallel after provisioning the machines
### by convention concurrency is counted down at the last stage if not 
explicitly defined
  - type: org.apache.brooklyn.enricher.stock.LimitGroupTasksSemaphore
    brooklyn.config:
      stage: post.provisioning
      parallel.operation.size: auto # meaning the whole cluster; or could be 
integer e.g. 10 for 10-at-a-time
  brooklyn.config:
    initialSize: 50
    memberSpec:
      $brooklyn:entitySpec:
        type: cluste-member



---


### Use an third entity to control the concurrency
brooklyn.catalog:
  items:
  - id: provisionBeforeInstallCluster
    version: 1.0.0
    item:
      type: cluster
      id: cluster
      brooklyn.parameters:
      - name: initial.cluster.size
        description: Initial Cluster Size
        default: 50
      brooklyn.config:
        initialSize: $brooklyn:config("initial.cluster.size")
        memberSpec:
          $brooklyn:entitySpec:
            type: cluster-member
            brooklyn.enrichers:
            - type: org.apache.brooklyn.enricher.stock.AquirePermissionToProceed
              brooklyn.config:
                stage: post.provisioning
### Delegate the concurrency decisions to the referee entity
                authorisor: $brooklyn:entity("referee")
      brooklyn.children:
      - type: org.apache.brooklyn.entity.TaskRegulationSemaphore
        id: referee
        brooklyn.config:
          initial.value: 
$brooklyn:entity("cluster").config("initial.cluster.size") # or 1 for 
sequential execution


---

Some thoughts from Alex form previous discussions on how it would look like in 
YOML with initd-style effectors:

I’d like to have a semaphore on normal nodes cluster and for the 
⁠⁠⁠⁠launch⁠⁠⁠⁠ step each node acquires that semaphore, releasing when 
confirmed joined.  i could see a task you set in yaml eg if using the initdish 
idea

035-pre-launch-get-semaphore: { acquire-semaphore: { scope: $brooklyn:parent(), 
name: "node-launch" } }
040-launch: { ssh: "service cassandra start" }
045-confirm-service-up: { wait: { sensor: service.inCluster, timeout: 20m } }
050-finish-release-semaphore: semaphore-release

tasks of type ⁠⁠⁠⁠acquire-semaphore⁠⁠⁠⁠ would use (create if needed) a named 
semaphore against the given entity … but somehow we need to say when it should 
automatically be released (eg on failure) in addition to explicit release (the 
⁠⁠⁠⁠050⁠⁠⁠⁠ which assumes some scope, not sure how/if to implement that)

---

Thanks to Geoff who shared his thoughts on the subject, with this post based on 
them.

Svet.

[PROPOSAL] Controlling effectors concurrency

Reply via email to