ahgittin commented on PR #1307: URL: https://github.com/apache/brooklyn-server/pull/1307#issuecomment-1165356164
this seems to be triggering a race condition where the DMG is being destroyed while a policy runs trying to create a child bucket. for me it is non-deterministic, happening about 1 in 4 runs. it is rare that this would occur IRL, basically you need to destroy a DMG in the same sub-second that it is trying to add. fortunately it is easier to re-create with a test because it can destroy right after the entity which will trigger the bucket is created, when the policy is still running. it would still be nice to fix though! i think there are several ways we could do this: (1) if persisting a BrooklynObject (source BO, eg entity, policy) which has a reference to a target BO which is being destroyed or is destroyed, warn during persistence; this will at least help us to track down the cause if we subsequently see a rebind problem - EASY but NOT A FIX (1') - as (1), but either omit from persistence or destroy the source BO when one is detected with a missing target - FIX but MESSY, it's basically a clean up process rather than the root cause, and RISKY if ever there are valid cases for persisting something that might have a dangling reference (but I don't think there are) (2) when creating a source BO, check if its parent/entity is no longer managed, and fail at that point before it is recorded so that it doesn't end up in persisted state - FIX but NEED BE CAREFUL of races and deadlocks (3) where target BO's create source BO entities/policies/etc directly, set a flag and do a check in `onManagementStopping` to block until such creation is completed - FIX but PER ENTITY so tedious and easy to miss and BLOCKS DELETION which is not ideal i think (2) will be the best, maybe also with (1) -- keeping 1' in mind for future; for (2) i think currently we set a flag on the target BO early on in the management stopping process, before it tells the persistence store the item is being deleted, so even if they are racing, the source BO checking for that flag should work. even if unset it will immediately add the reference to the target BO, giving "plenty of time" (milliseconds) for that to complete before the target BO goes from the management stopping to informing persistence to delete, and meaning by the time the target BO tells persistence state it is gone, it will almost certainly have the reference to the source BO and so both will be deleted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
