Robert Thurlow wrote:
> Mike Gerdts wrote:
>
>> When would it be desirable for umountall to unmount file systems in
>> all zones?
>
> If we're bringing the system-as-a-whole down, we try to stop
> zones, but if one of them fails to shut down, it is better
> to try to unmount the filesystems mounted in them to free
> resources on the servers. This is nice behaviour on a
> network; without it, other clients will be unable to get
> locks and state until lease periods expire.
>
>> During system shutdown, all zones should be down before the autofs and
>> nfs client services stop in the global zones. In the event that some
>> zone is not shut down, this means that it is likely stuck in a
>> shutting down state and any calls to unmount "stuck" nfs mounts in
>> that zone will result in a hung system call and an SMF stop method
>> timeout.
>
> It's easy, especially during development, to get zones that
> won't shut down - all you need is one vnode refcount wrong,
> for example. We're not ever going to be able to guarantee
> that such a refcount leak won't escape into the wild (and
> several have done so). I have also seen global zone threads
> able to do the unmount logic successfully on behalf of a
> zone - because that work is not affected by refcounts.
>
> In summary, I like Pavel's idea and code changes here. I
> would not object to the inverse flag to get back the
> "unmount everything" semantics. I'm not really willing to
> lose those semantics.
>
>> It seems to me that there is a real chance[1] that the RPC calls would
>> not even be routable to an NFS server. See, for example,
>> http://bugs.opensolaris.org/view_bug.do?bug_id=6476438.
>
> I believe there is a window during shutdown where we could
> usefully attempt an all-zones umountall. I'd want to have
> that be after zone shutdown, but I think that's already the
> case.
There is no longer the 'window during shutdown where we could
usefully attempt an all-zones umountall'. Fix for "6675447 NFSv4 client
hangs on
shutdown if server is down beforehand"- has added the '-l' flag
(limit actions to the local file systems) to svc.startd:
system("/sbin/umountall -l"
with this fix, we no longer unmount NFS there.
Summary regarding where we try unmounting non-global zones nfs mounts
during system shutdown, in the time order:
1) stop method of system/zones called from the global zone:
---> /lib/svc/method/svc-zones
----> zlogin -S $zone /sbin/init 0
-----> own instance of svc.startd calls zone's stop method of
nfs/client
2) stop method of nfs/client called from the global zone:
currently does cross-zone unmounting via umountall -F nfs,
but this will be removed
3) svc.startd in the global zone, after it kills all the processes, it
calls:
(void) system("/sbin/umountall -l");
4) vfs_unmountall()
#1 ... can unmount non-global zone? YES. shutdown of zones can fail,
but will print the list of the zones which failed to shutdown
#2 ... can unmount non-global zone? NO(YES) we propose to avoid it in
this code review
#3 ... can unmount non-global zone? NO. removed in 6675447
#4 ... can unmount non-global zone? NO. never goes OTW
===============
Let's take the decision.
REQUIREMENTS:
Rob:
a) the ability to do the 'all-zones unmount from global zone'
b) the 'all-zones unmount from global zone' should be part of the
regular system shutdown
Mike:
c) wants stop method nfs/client to work on the current zone
d) wants umountall(1M) to work on the current zone by default
The requirements have these priorities:
c) - highest
a)
b)
d) - lowest
It looks that most difficult to implement is b) - 'all-zones unmount
from global zone
should be part of the regular system shutdown'. We can add a code to
/lib/svc/method/svc-zones
which would at the very end of the stop method unmount all the
non-global zone mounts.
This would require a new semantics for the -Z option - see option #5
OPTIONS:
1) default behavior: don't limit action(s) to the current zone
available options: none
2) default behavior:: don't limit action(s) to the current zone
available options: -z ...limit action(s)to the current zone
3) default behavior:: limit action(s) to the current zone
available options: none
4) default behavior:: limit action(s) to the current zone
available options: -Z ...apply action(s) to all zones
5) default behavior:: limit action(s) to the current zone
available options: -Z ...apply action(s) to all *non-global* zones
So I propose to implement 5). Rob, Mike does it work for you?
Since we are introducing a new option, we must go through PSARC case,
and I expect more opinions/discussions to appear/happen there.
Thanks,
Pavel