On Tue, 09 Jun 2009 11:12:22 +0200, pavel filipensky <Pavel.Filipensky at sun.com> wrote:
> I have created separated webrevs for the bugs - to make the review more > comfortable and allow the reviewers to review only a subset of the 3 CRs: Hi Pavel, first of all, if you expect outside reviewers you probably should put the lot of analyse done in the bugs comments/evaluation section to either the bugs description so it appears on bugs.opensolaris.org, or even better, store a text file containing them at cr.opensolaris.org along with the webrevs, for those 3 bugs below, it would have been really helpfull for outside observers to better understand how you arrived here... 6500269 assertion failed: sp->mntinfo4_list == mi, file: ../../common/fs/nfs/nfs4_vfsops.c, line: 3258 > http://cr.opensolaris.org/~pavelf/6500269 <<< I have added some overall looks good, comments: http://cr.opensolaris.org/~pavelf/6500269/usr/src/uts/common/nfs/nfs4_clnt.h.udiff.html 1) * The mntinfo4_t::mi_recovlock protects the following fields: * mi_srvsettime * + * The mntinfo4_t::mi_recovlock protects also mi_srv. I'd prefer this to be instead: * The mntinfo4_t::mi_recovlock protects the following fields: ! * mi_srvsettime, mi_srv 2) + * If changing mi_srv from one server to different server, mi_recovlock + * as RW_WRITER is needed - happens in recov_newserver() and recov_clientid(). + * If mi_srv is set for the first time - happens in nfs4_mount() + * and nfs4_mountroot(), mi_recovlock is held as RW_READER. + * This means that holding mi_recovlock as RW_READER is enough to protect + * mi_srv from being changed from one server to another. this is a bit confusing...I'd prefer this to be changes like below, the functions may change over time...and the latter part is redundant * Changing mi_srv from one nfs4_server_t to a different one requires * holding the mi_recovlock as RW_WRITER. * Exception: setting mi_srv the first time in mount/mounroot is done * holding the mi_recovlock as RW_READER. http://cr.opensolaris.org/~pavelf/6500269/usr/src/uts/common/fs/nfs/nfs4_vfsops.c.udiff.html 3) find_nfs4_server_all() 3809 find_nfs4_server_all(mntinfo4_t *mi, int all) 3810 { 3811 nfs4_server_t *np = mi->mi_srv; 3812 3813 ASSERT(nfs_rw_lock_held(&mi->mi_recovlock, RW_READER) || 3814 nfs_rw_lock_held(&mi->mi_recovlock, RW_WRITER)); 3815 3816 if (np && (np->s_thread_exit != NFS4_THREAD_EXIT || all)) { 3817 mutex_enter(&np->s_lock); 3818 np->s_refcnt++; 3819 return (np); 3820 } 3821 return (NULL); 3822 } now, that looks suspicious, you use the nfs4_server_t 'np' prior having a lock on it, that is different from the old find_nfs4_server_all() impl. as well. since we're also no longer holding the nfs4_server_lst_lock I have doubts we're properly protected here. Btw, not your changes but still worth mentioning it: 3809 find_nfs4_server_all(mntinfo4_t *mi, int all) 3810 { 3811 nfs4_server_t *np = mi->mi_srv; 3812 3813 ASSERT(nfs_rw_lock_held(&mi->mi_recovlock, RW_READER) || 3814 nfs_rw_lock_held(&mi->mi_recovlock, RW_WRITER)); btw, this assert is not sufficient in the context of find_nfs4_server_all() it only assert that _someone_ is holding the lock: 4883 nfs_rw_lock_held(nfs_rwlock_t *l, krw_t rw) 4884 { 4885 4886 if (rw == RW_READER) 4887 return (l->count > 0); 4888 ASSERT(rw == RW_WRITER); 4889 return (l->count < 0); 4890 } in the context of find_nfs4_server_all()/find_nfs4_server() we really want to know that the _current caller_ is holding the lock as called out in the comment: 3791 * The caller should be holding mi->mi_recovlock, and it should continue to 3792 * hold the lock until done with the returned nfs4_server_t. Once 3793 * mi->mi_recovlock is released, there is no guarantee that the returned 3794 * mi->nfs4_server_t will continue to correspond to mi. 3795 */ 3796 nfs4_server_t * 3797 find_nfs4_server(mntinfo4_t *mi) ================================= 6707722 nfs4_start_fop() uses gethrestime_sec() with 1 second granularity to detect server change > http://cr.opensolaris.org/~pavelf/6707722 looks good, but comments: http://cr.opensolaris.org/~pavelf/6707722/usr/src/uts/common/fs/nfs/nfs4_recovery.c.udiff.html + int droplock_cnt; I'd prefer this to be a uint_t. http://cr.opensolaris.org/~pavelf/6707722/usr/src/uts/common/nfs/nfs4_clnt.h.udiff.html + int mi_srvset_cnt; /* increment whenever changing the nfs4_server_t */ I'd prefer mi_srvset_cnt to be a uint_t. ========================================= 6721281 panic: assertion failed: np != 0L, file: ../../common/fs/nfs/nfs4_callback.c, line: 1309 > http://cr.opensolaris.org/~pavelf/6721281 this looks good to me. ========================================= Btw, since you touch nfs4_clnt.h anyways would ya mind fixing this along with it ? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6507213 cheers frankB