Here is a public webrev: http://cr.opensolaris.org/~pavelf/6500269-6707722-6721281/
Thanks, Pavel On 06/08/09 16:34, Pavel Filipensky wrote: > Hi, > > here is a webrev for > > 6707722 nfs4_start_fop() uses gethrestime_sec() with 1 second > granularity to detect server change > 6721281 panic: assertion failed: np != 0L, file: > ../../common/fs/nfs/nfs4_callback.c, line: 1309 > 6500269 assertion failed: sp->mntinfo4_list == mi, file: > ../../common/fs/nfs/nfs4_vfsops.c, line: 3258 > > > http://cte-www.uk/net/tb3.uk/tank/u/pf199842/6500269-6707722-6721281/webrev/ > > > > Background info: > > > All three CRs are related to the nfs4_server_t <-> mntinfo4_t > relationship. Not to reinvent the wheel, I will use Sam Falkner's > description of that relationship, taken from: > > *6667873 System panics in nfs4_recov_thread > <http://monaco.sfbay.sun.com/detail.jsf?cr=6667873>* > > <snip> > > The data structures go roughly like this, where the arrows mean "you > can go in this direction intuitively". > > vfs_t > > ^ > | > v > > mntinfo4_t <- nfs4_server_t > > | > v > > servinfo4_t > > Note that it's relatively easy to get from nfs4_server_t to > mntinfo4_t, but not vice-versa. mntinfo4_t is > at the heart of things, and the client code needs to get from > mntinfo4_t to nfs4_server_t sometimes; > notably, during unmount and failover. In order to get from mntinfo4_t > to nfs_server_t, we do the following: > > get the current servinfo4_t for our mntinfo4_t > for every nfs4_server_t in the kernel: > if the connection address in servinfo4_t > matches the address in nfs4_server_t > success > > </snip> > > Here is a short description and a fix for each CR: > > ---- > 6500269: in this CR the MOUNT code in nfs4_mount() is interrupted. > During the cleanup code, the lookup of nfs4_server_t for mntinfo4_t is > mismatched. > The code tries to remove mntinfo4_t from wrong nfs4_server_t. We get a > panic. > > FIX : introduce a direct pointer from mntinfo4_t to nfs4_server_t > > ---- > 6707722: This bug was never reported on a live system - just spotted > during code reading. As synopsis says: > "nfs4_start_fop() uses gethrestime_sec() with 1 second granularity to > detect server change" > If a failover of nfs4_server happens, nfs4_start_fop() might not > notice that. > > FIX: use an incrementing counter for detection of a server change > > ---- > 6721281: In this CR, the recovery thread is not able to find the > corresponding nfs4_server_t. The situation is different here: > - by design neither mntinfo4_t nor recovery thread gets a reference > count (s_refcnt) on nfs4_server_t; > so the code cannot assume that nfs4_server_t exists. > - however nfs4delegreturn_cleanup_impl() always assumes that > nfs4_server_t exists; this is a regression of 6632214 > > FIX: > if r_deleg_type == OPEN_DELEGATE_NONE; do not lookup nfs4_server_t > > > ===== > > Thanks, > Pavel >