Aled Sage created BROOKLYN-345:
----------------------------------
Summary: NPE on rebind of enricher
ServiceStateLogic$ComputeServiceState
Key: BROOKLYN-345
URL: https://issues.apache.org/jira/browse/BROOKLYN-345
Project: Brooklyn
Issue Type: Bug
Reporter: Aled Sage
Using Brooklyn 0.10.0-SNAPSHOT (e.g. brooklyn-server commit 66b9b1c)...
When rebinding to existing persisted state, it failed to create the enricher
{{ServiceStateLogic$ComputeServiceState}} with the NPE shown below (but then
continued, so this is otherwise benign):
{noformat}
2016-09-09 09:31:03,850 WARN o.a.b.c.m.r.RebindExceptionHandlerImpl
[brooklyn-execmanager-VtZheMDn-0]: problem adding enricher cdoy70m1hv
(ServiceFailureDetector{name=org.apache.brooklyn.policy.ha.ServiceFailureDetector,
uniqueTag=service.state.actual, running=true,
entity=VanillaSoftwareProcessImpl{id=pwz8z4pbyp}, id=cdoy70m1hv}) to entity
pwz8z4pbyp (VanillaSoftwareProcessImpl{id=pwz8z4pbyp}); continuing
java.lang.NullPointerException: null
at
org.apache.brooklyn.policy.ha.ServiceFailureDetector.setActualState(ServiceFailureDetector.java:188)
~[brooklyn-policy-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.entity.lifecycle.ServiceStateLogic$ComputeServiceState.onEvent(ServiceStateLogic.java:288)
~[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.policy.ha.ServiceFailureDetector.onEvent(ServiceFailureDetector.java:159)
~[brooklyn-policy-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.entity.lifecycle.ServiceStateLogic$ComputeServiceState.setEntity(ServiceStateLogic.java:274)
~[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.entity.AbstractEntity$BasicEnricherSupport.add(AbstractEntity.java:1788)
~[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.BasicEntityRebindSupport.addEnrichers(BasicEntityRebindSupport.java:145)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.BasicEntityRebindSupport.addEnrichers(BasicEntityRebindSupport.java:47)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindIteration.associateAdjunctsWithEntities(RebindIteration.java:650)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindIteration.doRun(RebindIteration.java:244)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.InitialFullRebindIteration.doRun(InitialFullRebindIteration.java:69)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindIteration.run(RebindIteration.java:266)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindManagerImpl.rebindImpl(RebindManagerImpl.java:558)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindManagerImpl$3.call(RebindManagerImpl.java:508)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.core.mgmt.rebind.RebindManagerImpl$3.call(RebindManagerImpl.java:506)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
[brooklyn-core-0.10.0-20160907.0931.jar:0.10.0-20160907.0931]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_95]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_95]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_95]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]
{noformat}
The line throwing the NPE is executing {{setEntityOnFireTime = now +
getConfig(SERVICE_ON_FIRE_STABILIZATION_DELAY).toMilliseconds()}}. That
suggests we got null for the config value, even though there is a default value
of zero for it.
The enricher persisted state contains:
{noformat}
<serviceOnFire.stabilizationDelay>
<org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent_-DslConfigSupplier>
<component>
<componentId></componentId>
<scope>THIS</scope>
</component>
<keyName>swarm.recovery.stabilizationDelay</keyName>
</org.apache.brooklyn.camp.brooklyn.spi.dsl.methods.DslComponent_-DslConfigSupplier>
</serviceOnFire.stabilizationDelay>
{noformat}
And the owning entity has:
{noformat}
brooklyn.parameters:
- name: swarm.recovery.stabilizationDelay
label: Stabilization Delay
description: |
Time period for which the service must be consistently in the same
state to trigger an action
# A restart shouldn't trigger failure
type: org.apache.brooklyn.util.time.Duration
default: 5m
{noformat}
There are a few things we should think about fixing:
1. guard against the NPE in {{ServiceFailureDetector.setActualState}} (i.e.
handle when the config returns null) - but unclear what value it should then
default to.
2. avoid calling this code on rebind. for example, instead of
{{ComputeServiceState.setEntity}} immediately calling {{onEvent(null)}}, it
could subscribe with "notifyOfInitialValue" so that it gets a callback (in the
right thread, at the right time).
3. investigate further why the config lookup returned null - e.g. is it because
entity wasn't fully initialised, or because the DSL didn't find the default
value defined in brooklyn.parameters?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)