Github user aledsage commented on the issue:
https://github.com/apache/brooklyn-server/pull/671
@ahgittin (cc @neykov) your fix looks good to me, with a few (longer term)
caveats:
* Exposing `abstractEntity.getAttributesSynchObjectInternal()` and
`attributeMap. getSynchObjectInternal()` feels scary, like it's going in the
wrong direction - it's exposing the internal synchronization which could make
it harder to reason about and subsequent simplify (e.g. what else will call
it?!). However, I agree it's the right thing to do for now.
* I agree about having a "sync object" at the `AbstractEntity` level, used
for everything that the entity is doing (including children, groups and
sensors).
* Things like `LocalSubscriptionManager.submitPublishEvent` are scary. It
calls `subscription.eventFilter.apply()` in the current thread (thus
potentially holding locks). This is an example of calling "alien code". In your
example, the filter comes from `subscribeToMembers`, so does
`parent.getMembers().contains(input.getSource())`. This has led back to a
different entity's synchronization block (for group membership) while holding
the `AttributeMap`'s lock on the first entity.
* I worry there is still a serious risk of deadlock in more complex
use-cases. For example, if involving multiple entities that interact via their
group membership and their sensor notifications/subscriptions. We need a proper
review (and simplification!) of this.
* Part of the complexity is in our parent/child relationship stuff. We can
massively simplify that part once we delete the entity-constructor way of
creating entities. We can assume that the entity's parent is set from the very
start. That has been deprecated since 0.5.0, but is still used in too many
tests for us to be able to actually delete it yet!
In conclusion, I'm happy to (tentatively!) merge this. I can see how it
fixes this particular deadlock and makes other deadlocks less likely.
In future PRs, I'll delete the entity-construction, simplify the
parent-child code, and I'll also think more about how to simplify (and
document) our threading/synchronization model. It definitely deserves attention!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---