On Tue, 24 May 2016, Morris, Kevin J. (RET-DAY) wrote:
> We have ~20 zLinux systems running in their own LPARs (no zVM) all on RHEL
> 6.4 without any issues over the past year. All systems were rebooted weekly,
> again without any issues.
Is the reboot being trigerred internally (a cron process, or
external process sshing in and sending the reboot sequence),
or via an external signal (I am wondering if a too fast
shutdown before HyperPAV state has synced, is in play)
> We are in the process of migrating them to ClefOS 6.7 and have started to
> experience random errors with the file system check at reboot essentially
> causing the box to hang until someone manually intervenes. The following
> messages are displayed on the console:
> *** An error occurred during the file system check.
This is usually from 'finding' an un-removed (from the prior
session) /.autofsck file remnant. trace the shutdown and
startup code in /etc/init.d/ via:
cd /etc/init.d
grep fsck *
to see the scripts in play under ClefOS 6 series. When those
don't get removed and the filesystem synced to get the updates
done, spurious fsck's get requested. It is my understanding
that 'under the hood, the HyperPAV facility seeks to
anticipate such operations, and it may just be 'not enough'
settling time
If it is the non-root ("/") partition which is being 'fsck'ed
it may be worth adding a k99 step, blocking on a umount of
such. It would certainly be 'harmless' to add in any case
partition
https://bugzilla.redhat.com/show_bug.cgi?id=738870 is
suggestive that detection of non-sync's states had been a
topic
> If I remove the PAV aliases from /etc/dasd.conf, the reboot problems appear
> to go away -- I can reboot 50 times without an issue.
Thus suggesting HyperPAV quite strongly
> Obviously, we want to use HyperPAV, so simply removing the aliases isn't
> really a "fix".
> Can anyone offer any advice on how solve this problem or further debug the
> issue.
> Something apparently changed between RHEL 6.4 and ClefOS (RHEL) 6.7.
Kind of a big step in terms of time between 6.4 and 6.7; but
the suggestions above, along with perhaps adding remote
syslogging, to permit capturing early 'dmesg' content, and
possibly late messages, come to mind as places to start
Thanks for trying ClefOS -- I appreciate the report
-- Russ herrold
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/