See http://blogs.sun.com/tdh/entry/debugging_a_refcnt and 
http://blogs.sun.com/tdh/entry/more_on_that_refcnt_debugging
for more than you want to know about this bug.

Basically, the last couple of paragraphs of the second article explain 
it all:

> Okay, I took my first advice to add more comments and I have a new 
> piece of code to question in nfs4_ephemeral_umount 
> <http://src.opensolaris.org/source/s?refs=nfs4_ephemeral_umount>:
>
>    1763       int                     is_recursed = FALSE;
>    1764       int                     was_locked = FALSE;
> ...
>    1770       *pmust_unlock = FALSE;
> ...
>    1817       if (!is_recursed) {
> ...
>    1845                       was_locked = TRUE;
>    1846               } else {
>    1847                       net->net_refcnt++;
>    1848                       ASSERT(net->net_refcnt != 0);
> ...
>    1859       if (was_locked == FALSE) {
> ...
>    1866               if (mutex_tryenter(&net->net_tree_lock)) {
>    1867                       *pmust_unlock = TRUE;
>    1868               } else {
>   
>
> So, if *!is_recursed*, then we can end up on line 1847 to bump the 
> refcnt. At this point, **pmust_unlock* is FALSE and *was_locked* is 
> also FALSE. If we grab the *net->net_tree_lock* right away on 1866, 
> then crud, it is okay.
>
> But the other case is not. Assume that *is_recursed* is TRUE. That 
> means we do not bump the refcnt and at 1859, the following values 
> hold: *was_locked* is FALSE and **pmust_unlock* is FALSE. Now if we 
> grab the mutex right away, we end up with **pmust_unlock* as TRUE. In 
> my examination of the code, that then implies we *must* drop the 
> refcnt at some exit point. But we just saw that we never grabbed it in 
> this scenario.
>
> This would account for us trying to rele a refcnt we never held. We 
> return TRUE to nfs4_free_mount 
> <http://src.opensolaris.org/source/s?refs=nfs4_free_mount> and it 
> sends this into nfs4_ephemeral_umount_activate 
> <http://src.opensolaris.org/source/s?refs=nfs4_ephemeral_umount_activate> 
> and thus into nfs4_ephemeral_umount_unlock 
> <http://src.opensolaris.org/source/s?refs=nfs4_ephemeral_umount_unlock>.
>
> I'm going to introduce more state in a *pmust_rele to keep track of 
> whether I need to rele or not.
>
> ------------------------------------------------------------------------

Testing has shown that we were dropping the refcnt when we had never 
held it in the first place.

I solve the problem by passing in a boolean to keep track of whether or 
not we held the
lock and thus we need to rele it.

As shown in the blog entries, I did some other changes as well:

>    1. Introduced a incr function to make sure that all bumps are correct:
>          291  static void
>          292  nfs4_ephemeral_tree_incr(nfs4_ephemeral_tree_t *net)
>          293  {
>          294          ASSERT(mutex_owned(&net->net_cnt_lock));
>          295          net->net_refcnt++;
>          296          ASSERT(net->net_refcnt != 0);
>          297  }
>          298
>          299  static void
>          300  nfs4_ephemeral_tree_hold(nfs4_ephemeral_tree_t *net)
>          301  {
>          302          mutex_enter(&net->net_cnt_lock);
>          303          nfs4_ephemeral_tree_incr(net);
>          304          mutex_exit(&net->net_cnt_lock);
>          305  }
>       ...
>         1859                  } else {
>         1860                          nfs4_ephemeral_tree_incr(net);
>         1861                  }
>             
>    2. Fixed up the one locking issue:
>          734          } else {
>          735                  net = mi->mi_ephemeral_tree;
>          736                  nfs4_ephemeral_tree_hold(net);
>          737
>          738                  mutex_exit(&mi->mi_lock);
>             
>    3. And I still have the change which I thought would fix the bug:
>         2006                  if (was_locked == FALSE)
>         2007                          mutex_exit(&net->net_tree_lock);
>             
>

The webrev can be found here:

http://cr.opensolaris.org/~tdh/6751438/

Reply via email to