Re: How to unset the ON_FIRE state?

Aled Sage Fri, 01 Aug 2014 09:25:35 -0700

FYI I've started work on automatically persisting feeds.

After that, I'll look at persisting subscriptions (but entity impls sooften use anonymous inner classes that those might be very hard topersist+deserialize automatically on brooklyn restart).


Aled


On 31/07/2014 20:12, Aled Sage wrote:

Hi Alex,

Good suggestions. A couple of questions.
Do we definitely need both SERVICE_NOT_UP_INDICATORS andSERVICE_PROBLEMS, or could be have just one?The benefit I see of both is that SERVICE_PROBLEMS could be applicablewhen starting/stopping, whereas SERVICE_NOT_UP_INDICATORS only matterwhen we are supposed to be running.
If we can get away with just one, it will keep things simpler.

---
The issue addressed in PR #101 is that some entities are written suchthat enrichers are getting duplicated when brooklyn restarts: the"rebind" re-registers the enricher, but the entity also registers anenricher again when its rebind() method calls its connectSensors()method.
I agree with your points Alex in #101, especially with us addingsupport for persisting sensor feeds and subscriptions (though theseoften use anonymous inner classes in their impl which makes them hardto persist!).
The medium-to-long term solution is to fix our entity implementations.

---
> Finally for tracking enrichers, sensor feeds, subscriptions, andpolicies, I suggest> we add an optional "uniqueName", the presence of which blocks theaddition of> something of the same kind with the same uniqueName. This willbetter solve the> problem described in #101, and it gives us a way to allow code tofind and/or
> remove some of the enrichers above if they need to customize logic.
Do you mean that addEnricher() would be overloaded to take an optionalname (or perhaps part of EnricherSpec) - so addEnricher would be ano-op if there was already an enricher with that name? And similarlyfor addSubscription etc?
What would addEnricher return instead? By "same kind", do you meananother enricher rather than a subscription etc, rather than meaninganother enricher of the same Java type?
For #101, I lean towards logging a warning instead. When the entitycalls `connectSensors()` during rebind, we know it's still rebindingso could do an extra check if there is an enricher that looks thesame. If there is, then warn.
Fundamentally, addEnricher should add the enricher. And we shouldupdate all our entity implementations to the new pattern of doing morein the init of the entity. If that pattern isn't right, we shouldfigure out what is.
Aled


On 31/07/2014 16:18, Alex Heneveld wrote:
Hi Svet, All,
Good time to raise this. I've been wondering about a similar thing,and mentioned this at #101. i'd like to see a way that newrequirements for service_up and service_state can easily be added bythird parties.
Currently we explicitly attach a "computeServiceUp" computation at afew entities (e.g. to say service is up iff service_state = runningAND REST to /foo returns 200). But it this is ad hoc, and it doesnot easily third party updates (in particular clearing problemscleanly, ie having independent problem detectors and clearers). Arelated ambiguity is in SERVICE_STATE which is a combination ofexpected state together with being on_fire for some problems.
I'd like to suggest:

1) We add a SERVICE_NOT_UP_INDICATORS *map* sensor
2) We attach an enricher which sets SERVICE_UP based onSERVICE_NOT_UP_INDICATORS.isEmpty()
Then up-ness is controlled by effectors and policies which add andremove SERVICE_NOT_UP_INDICATORS, keyed by an identifier unique tothem. For instance all clusters and fabrics would simply subscribeto children's UP events and add such an indicator object the under"cluster.size" key if there are not a sufficient number of UPchildren. (Incidentally this would solve an issue where clusterhealth is not always cleared appropriately when nodes come backonline.) The existing isRunning checks for SoftwareProcess entitieswould also add such an indicator if it is detected as not running.
And we do something similar for SERVICE_STATE:
Introduce a SERVICE_PROBLEMS *map* attribute and an enricher whichsets SERVICE_STATE based on the problems being empty and the value ofnew sensor SERVICE_STATE_EXPECTED. SERVICE_STATE_EXPECTED is set bythe lifecycle tasks, and then: if a service is expected starting orstopping that is shown as SERVICE_STATE, otherwise if!SERVICE_PROBLEMS.isEmpty() it is set as ON_FIRE, otherwise it is setbased on SERVICE_STATE_EXPECTED and SERVICE_UP. Also we could havean enricher which puts a SERVICE_PROBLEM if`(SERVICE_STATE_EXPECTED==RUNNING && SERVICE_UP==false)`.
This is a touch more complicated than SERVICE_UP but I think it wouldbe clearer and could simplify some of the "isRunning" logic checksduring post-start. Where we want to wait on multiple things todetermine up-ness, we can insert a SERVICE_NOT_UP_INDICATOR manually,then wait for the appropriate enricher/feed to clear it. And itcould handle the case where a subscription should be responsible forthe final transition to EXPECTED=RUNNING (there are a few cases wherestart will set RUNNING early, and a subscription comes along laterand finishes the job, after sensors have been emitted). And ofcourse it would support Svet's use case where the "abc-compliance"policy would simply add an entry { abc-compliance: "Replicationviolation" } to the SERVICE_PROBLEMS, and clear it if it becomes okay-- and service state is automatically updated to be ON_FIRE whenthere is a compliance problem.
Finally for tracking enrichers, sensor feeds, subscriptions, andpolicies, I suggest we add an optional "uniqueName", the presence ofwhich blocks the addition of something of the same kind with the sameuniqueName. This will better solve the problem described in #101,and it gives us a way to allow code to find and/or remove some of theenrichers above if they need to customize logic.
Best
Alex


On 31/07/2014 05:34, Svetoslav Neykov wrote:
Hi,
It seems that there is no way to unset an ON_FIRE state previouslyset by mycode. First it is not clear what the new state should be and secondsome
other code could've set the state as well meanwhile.

Here is some background. I am developing sample policies which monitor
machines for compliance with certain rules. If the rule is broken the
machine should be set ON_FIRE. So far so good. The problem is thatonce the
machines are back in compliant state I need to clear the error.
The ON_FIRE state in Lifecycle seems orthogonal to the rest of thestates.
Logically we can have ON_FIRE while RUNNING or STARTING. It could be a
temporary error, not a final state in the state machine.
Just as an observation, we could have an entity ON_FIRE andSERVICE_UP at
the same time.


Possible solutions to the ON_FIRE issue could be:

*        Forbidding manual setting of ON_FIRE state, instead creating a
mechanism to register functions returning the state. By default itwould be
SERVICE_STATE == RUNNING. The cons is that it is a poll-based approach.
* Reference counting the setting of ON_FIRE. The cons is thatit is
requires tedious housekeeping, leading to bugs.
Perhaps a combination of both approaches would be best - use thefirst one
with a long poll, with the ability to trigger the check manually.


Any thoughts?


Best,

Svet.

Re: How to unset the ON_FIRE state?

Reply via email to