Here is a public webrev:

http://cr.opensolaris.org/~pavelf/6500269-6707722-6721281/


Thanks,
Pavel

On 06/08/09 16:34, Pavel Filipensky wrote:
> Hi,
>
> here is a webrev for
>
> 6707722 nfs4_start_fop() uses gethrestime_sec() with 1 second 
> granularity to detect server change
> 6721281 panic: assertion failed: np != 0L, file: 
> ../../common/fs/nfs/nfs4_callback.c, line: 1309
> 6500269 assertion failed: sp->mntinfo4_list == mi, file: 
> ../../common/fs/nfs/nfs4_vfsops.c, line: 3258
>
>
> http://cte-www.uk/net/tb3.uk/tank/u/pf199842/6500269-6707722-6721281/webrev/ 
>
>
>
> Background info:
>
>
> All three CRs are related to the nfs4_server_t <-> mntinfo4_t 
> relationship. Not to reinvent  the wheel, I will use Sam Falkner's 
> description of that relationship, taken from:
>
> *6667873 System panics in nfs4_recov_thread 
> <http://monaco.sfbay.sun.com/detail.jsf?cr=6667873>*
>
> <snip>
>
> The data structures go roughly like this, where the arrows mean "you 
> can go in this direction intuitively".
>
> vfs_t
>
>  ^
>  |
>  v
>
> mntinfo4_t <- nfs4_server_t
>
>  |
>  v
>
> servinfo4_t
>
> Note that it's relatively easy to get from nfs4_server_t to 
> mntinfo4_t, but not vice-versa.  mntinfo4_t is
> at the heart of things, and the client code needs to get from 
> mntinfo4_t to nfs4_server_t sometimes;
> notably, during unmount and failover.  In order to get from mntinfo4_t 
> to nfs_server_t, we do the following:
>
> get the current servinfo4_t for our mntinfo4_t
> for every nfs4_server_t in the kernel:
>    if the connection address in servinfo4_t
>    matches the address in nfs4_server_t
>        success
>
> </snip>
>
> Here is a short description and a fix for each CR:
>
> ----
> 6500269: in this CR the MOUNT code in nfs4_mount() is interrupted. 
> During the cleanup code, the lookup of nfs4_server_t for mntinfo4_t is 
> mismatched.
> The code tries to remove mntinfo4_t from wrong nfs4_server_t. We get a 
> panic.
>
> FIX : introduce a direct pointer from mntinfo4_t to nfs4_server_t
>
> ----
> 6707722: This bug was never reported on a live system - just spotted 
> during code reading. As synopsis says:
> "nfs4_start_fop() uses gethrestime_sec() with 1 second granularity to 
> detect server change"
> If a failover of nfs4_server happens, nfs4_start_fop() might not 
> notice that.
>
> FIX: use an incrementing  counter for detection of a  server change
>
> ----
> 6721281: In this CR, the recovery thread is not able to find the 
> corresponding nfs4_server_t. The situation is different here:
> - by design neither mntinfo4_t nor recovery thread gets a reference 
> count (s_refcnt) on nfs4_server_t;
>  so the code cannot assume that nfs4_server_t  exists.
> - however nfs4delegreturn_cleanup_impl() always assumes that 
> nfs4_server_t  exists; this is a regression of  6632214
>
> FIX:
> if  r_deleg_type == OPEN_DELEGATE_NONE; do not lookup nfs4_server_t
>
>
> =====
>
> Thanks,
> Pavel
>


Reply via email to