Hi,

* Thomas Thrainer <[email protected]> [2013-11-29 10:40:59 +0100]:

> LGTM, thanks!
> But as I was in involved quite a bit in the bug / discussion, maybe it
> might make sense that somebody else takes a look too...
> 

I think this patch does not actually fixes the issue 621 (explained in #3).

The reported corner case was when LUInstanceRemove() and
LUNetworkDisconnect() were running concurently.

A workaround that we have tried is upon LUNetworkDisconnect() and
LUNetworkConnect() try to acquire all cluster's instances in shared mode.
By that _LS_ACQUIRE_ALL mode is set and not _LS_ACQUIRE_EXACT and thus the
deleted lock does cause any problem.

Still I don't think this is the proper way to go. Maybe exporting the locking
mode up to the LU level just like oportunistic/share locking option
(as chris suggested during an internal discussion) is a rational decision.

What do you think?

Besides that thanks for the patch,
dimara


> Thomas
> 
> 
> On Thu, Nov 28, 2013 at 3:48 PM, Petr Pudlak <[email protected]> wrote:
> 
> > This is required to prevent race conditions such as removing a network
> > from a group and adding an instance at the same time. (See issue 621#2.)
> >
> > Signed-off-by: Petr Pudlak <[email protected]>
> > ---
> >  lib/cmdlib/instance.py | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >
> > diff --git a/lib/cmdlib/instance.py b/lib/cmdlib/instance.py
> > index 12340e8..5d58544 100644
> > --- a/lib/cmdlib/instance.py
> > +++ b/lib/cmdlib/instance.py
> > @@ -559,6 +559,22 @@ class LUInstanceCreate(LogicalUnit):
> >      self.needed_locks[locking.LEVEL_NODE_RES] = \
> >        CopyLockList(self.needed_locks[locking.LEVEL_NODE])
> >
> > +    # Optimistically acquire shared group locks (we're reading the
> > +    # configuration).  We can't just call GetInstanceNodeGroups, because
> > the
> > +    # instance doesn't exist yet. Therefore we lock all node groups of all
> > +    # nodes we have.
> > +    if self.needed_locks[locking.LEVEL_NODE] == locking.ALL_SET:
> > +      # In the case we lock all nodes for opportunistic allocation, we
> > have no
> > +      # choice than to lock all groups, because they're allocated before
> > nodes.
> > +      # This is sad, but true. At least we release all those we don't
> > need in
> > +      # CheckPrereq later.
> > +      self.needed_locks[locking.LEVEL_NODEGROUP] = locking.ALL_SET
> > +    else:
> > +      self.needed_locks[locking.LEVEL_NODEGROUP] = \
> > +        list(self.cfg.GetNodeGroupsFromNodes(
> > +          self.needed_locks[locking.LEVEL_NODE]))
> > +    self.share_locks[locking.LEVEL_NODEGROUP] = 1
> > +
> >    def DeclareLocks(self, level):
> >      if level == locking.LEVEL_NODE_RES and \
> >        self.opportunistic_locks[locking.LEVEL_NODE]:
> > @@ -846,6 +862,21 @@ class LUInstanceCreate(LogicalUnit):
> >      """Check prerequisites.
> >
> >      """
> > +    # Check that the optimistically acquired groups are correct wrt the
> > +    # acquired nodes
> > +    owned_groups = frozenset(self.owned_locks(locking.LEVEL_NODEGROUP))
> > +    owned_nodes = frozenset(self.owned_locks(locking.LEVEL_NODE))
> > +    cur_groups = list(self.cfg.GetNodeGroupsFromNodes(owned_nodes))
> > +    if not owned_groups.issuperset(cur_groups):
> > +      raise errors.OpPrereqError("New instance %s's node groups changed
> > since"
> > +                                 " locks were acquired, current groups
> > are"
> > +                                 " are '%s', owning groups '%s'; retry
> > the"
> > +                                 " operation" %
> > +                                 (self.op.instance_name,
> > +                                  utils.CommaJoin(cur_groups),
> > +                                  utils.CommaJoin(owned_groups)),
> > +                                 errors.ECODE_STATE)
> > +
> >      self._CalculateFileStorageDir()
> >
> >      if self.op.mode == constants.INSTANCE_IMPORT:
> > @@ -957,6 +988,9 @@ class LUInstanceCreate(LogicalUnit):
> >      ReleaseLocks(self, locking.LEVEL_NODE, keep=keep_locks)
> >      ReleaseLocks(self, locking.LEVEL_NODE_RES, keep=keep_locks)
> >      ReleaseLocks(self, locking.LEVEL_NODE_ALLOC)
> > +    # Release all unneeded group locks
> > +    ReleaseLocks(self, locking.LEVEL_NODEGROUP,
> > +                 keep=self.cfg.GetNodeGroupsFromNodes(keep_locks))
> >
> >      assert (self.owned_locks(locking.LEVEL_NODE) ==
> >              self.owned_locks(locking.LEVEL_NODE_RES)), \
> > --
> > 1.8.4.1
> >
> >
> 
> 
> -- 
> Thomas Thrainer | Software Engineer | [email protected] |
> 
> Google Germany GmbH
> Dienerstr. 12
> 80331 München
> 
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores

Attachment: signature.asc
Description: Digital signature

Reply via email to