Hi all,

Over a month has gone by since I reported this failing to boot issue (read
below) to SUSE and so far, nothing.  They did say they found a problem with
Dracut command but didn't say what.  So I decided to look closer and I think
I have found something.  I suspect this is not limited to just upgrading
from SLES12 to 15 and it affects all systems.

It seems that there may be a problem with the new s390-tools chzdev command
at the least.  Consider the following situation.  You have a Linux guest
with root file system on device 150, usr on 151, var on 152.  Then you make
a copy of these disk (DDR/flashcopy/Etc.), say you are cloning the guest.
The copied disks are now called F50, F51 and F52 respectively.  Now consider
that you boot from your normal 15x disks while you have the F5x disks linked
and defined to the virtual machine.  Then you use chzdev command to enable
any of the cloned disks, say F50 (clone of root). You can do this either
using commands like chzdev or via Yast DASD tool to activate.  So far so
good.  But if you then try to disable the same disk (F50) using chzdev (or
Yast DASD), chzdev complains with:

Warning: ECKD DASD 0.0.0f50 is in use!
         The following resources may be affected:
          - Mount point /
Continue with operation? (yes/no)

Well, it's neither in use (not mounted anywhere, just enabled), nor is it
the root mount point.  The same will be true for F51 and F52.  Chzdev states
that these are in use and mount points /usr and /var.  This does not happen
when the disk is not a clone.

Apparently, chzdev detects the partition UUID for each disk.  Since the F5x
disks are clones of the 15x disks in active use, their UUID matches and
apparently this confuses things.  If you simply change the UUID for the
cloned disk, then this will not happen.

Either the above problem, or something related or similar causes the grub2
tools and/or Dracut to somehow get confused.  So, if you happen to
activate/enable a DASD that has the same UUID as one of your root file
system disks, then by chance you were to run
Dracut/grub2-mkconfig/grub2-install on your live system, then your system
will fail to boot at the next reboot because it is trying to find one of the
devices needed for boot (150,151) even though they are there.  The F5x
device with identical UUID causes this failure.  It also doesn't matter if
the F5x disk is present at the next boot or not.

I am hoping someone can shed some light on this.  I am not getting anywhere
with SUSE and I feel that it is not unreasonable for the z/VM community to
have cloned disks that have identical UUIDs on different servers. Grant it
what I described above is not too likely of a scenario but still possible
and it did happen to me.  I also don't every change the UUID of disks that I
copy/clone.  Perhaps I should?

Also, hopefully someone from IBM is listening here and can say whether the
chzdev behavior is valid or not.  Note that you can take any disk, even an
empty disk and as long as you change the UUID of the empty disk to match one
of your active system disks UUID, you will have the same problems both with
chzdev and boot issue.

Thanks,
Aria






-----Original Message-----
From: Linux on 390 Port <[email protected]> On Behalf Of Marcy Cortes
Sent: Thursday, August 19, 2021 6:35 PM
To: [email protected]
Subject: Re: Warning if upgrading SLES12 to SLES15 SP3

Thanks for the heads up Aria!
We have a lot of them that upgraded from 12 to Sp1, then sp2, and now SP3 is
being tested.  Many have the 41's and 51's.
We'll look for this and report it if it happens here too.



-----Original Message-----
From: Linux on 390 Port <[email protected]> On Behalf Of Aria Bamdad
Sent: Thursday, August 19, 2021 3:18 PM
To: [email protected]
Subject: [LINUX-390] Warning if upgrading SLES12 to SLES15 SP3

Hi,

I found yesterday a problem which could result in a server failing to boot
once it is upgraded from SLES 12 to 15 and then to recently released SP3.  I
thought I should warn those here just in case.


My environment is running under z/VM, the servers use defined minidisks for
/, /usr and /var and two swap partitions defined as vdisks and formatted
using swapgen prior to boot.



What I found is that if you have a server that is SLES 12, then upgraded to
SLES 15 GA or SP1 or SP2, all works fine.  However, if you then upgrade the
same server to SP3 OR if you simply upgrade directly from SLES12SP5 to
SLES15SP3, then you will encounter this problem. You will not see this
problem if this was a fresh install of SLES 15 and upgrading to SP3.



The problem only surfaces when you go to SP3.  The upgrade to SP3 appears to
cause a change in the udev definitions for the DASD defined on the server
(in /etc/udev/rules.d) .  For SLES 12 systems, these rules are
51-dasd-0.0.0xxx.rules files but it seems that for SLES15, they change
format and number to 41-dasd-eckd-0.0.0xxx.rules.  However, this change does
not happen for an upgraded SLES12 system to SLES15 until you upgrade to
SLES15SP3.  Once you upgrade to SP3, then the old 51-dasd rules are renamed
with the '.legacy' extensions, new 41- rules are created.  The rules for the
vdisk for swap are left alone and thus after the upgrade, the swap
partitions are no longer activated.  There is more to this but I will not go
into detail.


However, that's not the problem.  At this point, if you attempt to do any
change to the bootloader or run mkinitrd, for instance if you use Yast to
update DASD and 'Activate' the swap disks or any disk for that matter,
causing mkinitrd to run as well as grub2-intall, this will write a faulty
boot loader and the system will no longer boot if it were to be rebooted.
You can use a rescue system to fix the broken boot but if you were to run
the above process again, the same will happen.  This is risky because you
may not reboot for 6 months and then you find out this is the case.



I have reported to SUSE but no response as of now.



Thanks,

Aria




----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or
visit
https://urldefense.com/v3/__http://www2.marist.edu/htbin/wlvindex?LINUX-390_
_;!!F9svGWnIaVPGSwU!9O5MNPss40ij3_1wRuURkrrjLNOaqETnR3jGNnULN_Nt8xdId85OvGWw
4vl1R2RvWJBLymI$

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or
visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to