Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/brooklyn-server/pull/82#discussion_r57459946
  
    --- Diff: 
core/src/main/java/org/apache/brooklyn/entity/group/DynamicMultiGroupImpl.java 
---
    @@ -119,7 +125,36 @@ public void rebind() {
             super.rebind();
     
             if (rescan == null) {
    -            connectScanner();
    +            // The rescan can (in a different thread) cause us to remove 
the empty groups - i.e. remove 
    +            // it as a child, and unmanage it. That is dangerous during 
rebind, because the rebind-thread 
    +            // may concurrently (or subsequently) be initialising that 
child entity. It has caused 
    +            // rebind errors where the child's entity-rebind complains in 
setParent() that it was
    +            // "previouslyOwned". Therefore we defer registering/executing 
our scanner until rebind is
    +            // complete, so all entities are reconstituted.
    +            // We don't worry about other managementNodeStates, such as 
standby: if we were told to
    +            // rebind then we are free to fully initialise ourselves. But 
we do double-check that we
    +            // are still managed before trying to execute.
    +            
    +            getExecutionContext().execute(new Runnable() {
    +                @Override public void run() {
    +                    LOG.debug("Deferring scanner for {} until management 
context initialisation complete", DynamicMultiGroupImpl.this);
    +                    while (!isRebindComplete()) {
    +                        Time.sleep(100); // avoid thrashing
    +                    }
    +                    LOG.debug("Connecting scanner for {}", 
DynamicMultiGroupImpl.this);
    +                    connectScanner();
    +                }
    +                private boolean isRebindComplete() {
    +                    // TODO Want to determine if finished rebinding 
(either success or fail is fine).
    +                    // But not a clean way to do this that works for both 
unit tests and live server?!
    +                    //  * In RebindTestFixtureWithApp tests, 
mgmt.getHighAvailabilityManager().getNodeState()
    +                    //    always returns INITIALIZING.
    +                    //  * The rebind metrics is a hack, and feels very 
risky for HOT_STANDBY nodes that 
    +                    //    may have executed the rebind code multiple times.
    --- End diff --
    
    Ah - the `noteStartupComplete()` is what is missing I think. All the 
sensible looking methods returned "initialising" in the unit tests. Longer 
term, it would be good to ensure the test code is more like the production code 
path!
    
    I'll merge as-is. We can look at this again next week. Thanks for the fast 
review @neykov !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to