Hi,

here is a webrev for

6707722 nfs4_start_fop() uses gethrestime_sec() with 1 second 
granularity to detect server change
6721281 panic: assertion failed: np != 0L, file: 
../../common/fs/nfs/nfs4_callback.c, line: 1309
6500269 assertion failed: sp->mntinfo4_list == mi, file: 
../../common/fs/nfs/nfs4_vfsops.c, line: 3258


http://cte-www.uk/net/tb3.uk/tank/u/pf199842/6500269-6707722-6721281/webrev/


Background info:


All three CRs are related to the nfs4_server_t <-> mntinfo4_t 
relationship. Not to reinvent  the wheel, I will use Sam Falkner's 
description of that relationship, taken from:

*6667873 System panics in nfs4_recov_thread 
<http://monaco.sfbay.sun.com/detail.jsf?cr=6667873>*

<snip>

The data structures go roughly like this, where the arrows mean "you can go in 
this direction intuitively".

vfs_t

  ^
  |
  v

mntinfo4_t <- nfs4_server_t

  |
  v

servinfo4_t

Note that it's relatively easy to get from nfs4_server_t to mntinfo4_t, but not 
vice-versa.  mntinfo4_t is
at the heart of things, and the client code needs to get from mntinfo4_t to 
nfs4_server_t sometimes;
notably, during unmount and failover.  In order to get from mntinfo4_t to 
nfs_server_t, we do the following:

get the current servinfo4_t for our mntinfo4_t
for every nfs4_server_t in the kernel:
    if the connection address in servinfo4_t
    matches the address in nfs4_server_t
        success

</snip>

Here is a short description and a fix for each CR:

----
6500269: in this CR the MOUNT code in nfs4_mount() is interrupted. 
During the cleanup code, the lookup of nfs4_server_t for mntinfo4_t is 
mismatched.
The code tries to remove mntinfo4_t from wrong nfs4_server_t. We get a 
panic.

FIX : introduce a direct pointer from mntinfo4_t to nfs4_server_t

----
6707722: This bug was never reported on a live system - just spotted 
during code reading. As synopsis says:
"nfs4_start_fop() uses gethrestime_sec() with 1 second granularity to 
detect server change"
If a failover of nfs4_server happens, nfs4_start_fop() might not notice 
that.

FIX: use an incrementing  counter for detection of a  server change

----
6721281: In this CR, the recovery thread is not able to find the 
corresponding nfs4_server_t. The situation is different here:
- by design neither mntinfo4_t nor recovery thread gets a reference 
count (s_refcnt) on nfs4_server_t;
  so the code cannot assume that nfs4_server_t  exists.
- however nfs4delegreturn_cleanup_impl() always assumes that 
nfs4_server_t  exists; this is a regression of  6632214

FIX:
if  r_deleg_type == OPEN_DELEGATE_NONE; do not lookup nfs4_server_t


=====

Thanks,
Pavel

Reply via email to