On Tue, 09 Jun 2009 11:12:22 +0200, pavel filipensky <Pavel.Filipensky at 
sun.com> wrote:

> I have created separated webrevs for the bugs - to make the review more
> comfortable and allow the reviewers to review only a subset of the 3 CRs:

Hi Pavel, first of all, if you expect outside reviewers you probably should 
put the lot of analyse done in the bugs comments/evaluation section
to either the bugs description so it appears on bugs.opensolaris.org, or even
better, store a text file containing them at cr.opensolaris.org along
with the webrevs, for those 3 bugs below, it would have been really helpfull
for outside observers to better understand how you arrived here...

6500269 assertion failed: sp->mntinfo4_list == mi, file: 
../../common/fs/nfs/nfs4_vfsops.c, line: 3258
> http://cr.opensolaris.org/~pavelf/6500269     <<< I have added some

overall looks good, comments:

http://cr.opensolaris.org/~pavelf/6500269/usr/src/uts/common/nfs/nfs4_clnt.h.udiff.html

1)
  *      The mntinfo4_t::mi_recovlock protects the following fields:
  *              mi_srvsettime
  *
+ * The mntinfo4_t::mi_recovlock protects also mi_srv.

I'd prefer this to be instead:

  *      The mntinfo4_t::mi_recovlock protects the following fields:
! *              mi_srvsettime, mi_srv

2)
+ * If changing mi_srv from one  server to different server, mi_recovlock
+ * as RW_WRITER is needed - happens in recov_newserver() and recov_clientid().
+ * If mi_srv is set for the first time - happens in nfs4_mount()
+ * and nfs4_mountroot(), mi_recovlock is held as RW_READER.
+ * This means that holding mi_recovlock as RW_READER is enough to protect
+ * mi_srv from being changed from one server to another.

this is a bit confusing...I'd prefer this to be changes like below, the 
functions may change
over time...and the latter part is redundant

* Changing mi_srv from one nfs4_server_t to a different one requires 
* holding the mi_recovlock as RW_WRITER.
* Exception: setting mi_srv the first time in mount/mounroot is done
* holding the mi_recovlock as RW_READER.

http://cr.opensolaris.org/~pavelf/6500269/usr/src/uts/common/fs/nfs/nfs4_vfsops.c.udiff.html

3) find_nfs4_server_all()

3809 find_nfs4_server_all(mntinfo4_t *mi, int all)
3810 {
3811         nfs4_server_t *np = mi->mi_srv;
3812 
3813         ASSERT(nfs_rw_lock_held(&mi->mi_recovlock, RW_READER) ||
3814             nfs_rw_lock_held(&mi->mi_recovlock, RW_WRITER));
3815         
3816         if (np && (np->s_thread_exit != NFS4_THREAD_EXIT || all)) {
3817                 mutex_enter(&np->s_lock);
3818                 np->s_refcnt++;
3819                 return (np);
3820         }
3821         return (NULL);
3822 }

now, that looks suspicious, you use the nfs4_server_t 'np' prior having a lock 
on it, that is
different from the old find_nfs4_server_all() impl. as well. since we're also
no longer holding the nfs4_server_lst_lock I have doubts we're properly 
protected here.

Btw, not your changes but still worth mentioning it:

3809 find_nfs4_server_all(mntinfo4_t *mi, int all)
3810 {
3811         nfs4_server_t *np = mi->mi_srv;
3812 
3813         ASSERT(nfs_rw_lock_held(&mi->mi_recovlock, RW_READER) ||
3814             nfs_rw_lock_held(&mi->mi_recovlock, RW_WRITER));

btw, this assert is not sufficient in the context of find_nfs4_server_all()
it only assert that _someone_ is holding the lock:

   4883 nfs_rw_lock_held(nfs_rwlock_t *l, krw_t rw)
   4884 {
   4885 
   4886         if (rw == RW_READER)
   4887                 return (l->count > 0);
   4888         ASSERT(rw == RW_WRITER);
   4889         return (l->count < 0);
   4890 }
   
in the context of find_nfs4_server_all()/find_nfs4_server() we
really want to know that the _current caller_ is holding the lock as
called out in the comment:

   3791  * The caller should be holding mi->mi_recovlock, and it should 
continue to
   3792  * hold the lock until done with the returned nfs4_server_t.  Once
   3793  * mi->mi_recovlock is released, there is no guarantee that the returned
   3794  * mi->nfs4_server_t will continue to correspond to mi.
   3795  */
   3796 nfs4_server_t *
   3797 find_nfs4_server(mntinfo4_t *mi)
   
=================================

6707722 nfs4_start_fop() uses gethrestime_sec() with 1 second granularity to 
detect server change
> http://cr.opensolaris.org/~pavelf/6707722

looks good, but comments:

http://cr.opensolaris.org/~pavelf/6707722/usr/src/uts/common/fs/nfs/nfs4_recovery.c.udiff.html

+        int droplock_cnt;

I'd prefer this to be a uint_t.

http://cr.opensolaris.org/~pavelf/6707722/usr/src/uts/common/nfs/nfs4_clnt.h.udiff.html
+        int mi_srvset_cnt; /* increment whenever changing the nfs4_server_t */

I'd prefer mi_srvset_cnt to be a uint_t.

=========================================

6721281 panic: assertion failed: np != 0L, file: 
../../common/fs/nfs/nfs4_callback.c, line: 1309
> http://cr.opensolaris.org/~pavelf/6721281

this looks good to me.

=========================================

Btw, since you touch nfs4_clnt.h anyways would ya mind fixing this along with 
it ?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6507213

cheers
frankB

Reply via email to