Robert Thurlow wrote: > Mike Gerdts wrote: > >> When would it be desirable for umountall to unmount file systems in >> all zones? > > If we're bringing the system-as-a-whole down, we try to stop > zones, but if one of them fails to shut down, it is better > to try to unmount the filesystems mounted in them to free > resources on the servers. This is nice behaviour on a > network; without it, other clients will be unable to get > locks and state until lease periods expire. > >> During system shutdown, all zones should be down before the autofs and >> nfs client services stop in the global zones. In the event that some >> zone is not shut down, this means that it is likely stuck in a >> shutting down state and any calls to unmount "stuck" nfs mounts in >> that zone will result in a hung system call and an SMF stop method >> timeout. > > It's easy, especially during development, to get zones that > won't shut down - all you need is one vnode refcount wrong, > for example. We're not ever going to be able to guarantee > that such a refcount leak won't escape into the wild (and > several have done so). I have also seen global zone threads > able to do the unmount logic successfully on behalf of a > zone - because that work is not affected by refcounts. > > In summary, I like Pavel's idea and code changes here. I > would not object to the inverse flag to get back the > "unmount everything" semantics. I'm not really willing to > lose those semantics. > >> It seems to me that there is a real chance[1] that the RPC calls would >> not even be routable to an NFS server. See, for example, >> http://bugs.opensolaris.org/view_bug.do?bug_id=6476438. > > I believe there is a window during shutdown where we could > usefully attempt an all-zones umountall. I'd want to have > that be after zone shutdown, but I think that's already the > case.
There is no longer the 'window during shutdown where we could usefully attempt an all-zones umountall'. Fix for "6675447 NFSv4 client hangs on shutdown if server is down beforehand"- has added the '-l' flag (limit actions to the local file systems) to svc.startd: system("/sbin/umountall -l" with this fix, we no longer unmount NFS there. Summary regarding where we try unmounting non-global zones nfs mounts during system shutdown, in the time order: 1) stop method of system/zones called from the global zone: ---> /lib/svc/method/svc-zones ----> zlogin -S $zone /sbin/init 0 -----> own instance of svc.startd calls zone's stop method of nfs/client 2) stop method of nfs/client called from the global zone: currently does cross-zone unmounting via umountall -F nfs, but this will be removed 3) svc.startd in the global zone, after it kills all the processes, it calls: (void) system("/sbin/umountall -l"); 4) vfs_unmountall() #1 ... can unmount non-global zone? YES. shutdown of zones can fail, but will print the list of the zones which failed to shutdown #2 ... can unmount non-global zone? NO(YES) we propose to avoid it in this code review #3 ... can unmount non-global zone? NO. removed in 6675447 #4 ... can unmount non-global zone? NO. never goes OTW =============== Let's take the decision. REQUIREMENTS: Rob: a) the ability to do the 'all-zones unmount from global zone' b) the 'all-zones unmount from global zone' should be part of the regular system shutdown Mike: c) wants stop method nfs/client to work on the current zone d) wants umountall(1M) to work on the current zone by default The requirements have these priorities: c) - highest a) b) d) - lowest It looks that most difficult to implement is b) - 'all-zones unmount from global zone should be part of the regular system shutdown'. We can add a code to /lib/svc/method/svc-zones which would at the very end of the stop method unmount all the non-global zone mounts. This would require a new semantics for the -Z option - see option #5 OPTIONS: 1) default behavior: don't limit action(s) to the current zone available options: none 2) default behavior:: don't limit action(s) to the current zone available options: -z ...limit action(s)to the current zone 3) default behavior:: limit action(s) to the current zone available options: none 4) default behavior:: limit action(s) to the current zone available options: -Z ...apply action(s) to all zones 5) default behavior:: limit action(s) to the current zone available options: -Z ...apply action(s) to all *non-global* zones So I propose to implement 5). Rob, Mike does it work for you? Since we are introducing a new option, we must go through PSARC case, and I expect more opinions/discussions to appear/happen there. Thanks, Pavel