On 9/20/21 10:12 PM, Chuck Zmudzinski wrote:
On 9/20/21 6:29 PM, Chuck Zmudzinski wrote:
On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:

On 9/20/21 12:27 AM, Elliott Mitchell wrote:
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:

I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.
Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.

I just tested the build with
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build.

I think one possible fix would require modifying
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
so it only applies at runtime to the arm architecture. I will try some
modifications to the patch instead of removing it, and if I get something
that works on amd64 and also might work on arm, I will post it
for Elliott to try.

I have an encouraging result. I found a very simple patch
to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
bug on my system and it should not affect the arm patches
at all:
--------------------------------------------------------------
This patch partially reverts previous patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

This hopefully fixes #911976

--- a/xen/arch/x86/acpi/lib.c    2021-09-20 16:49:08.000000000 -0400
+++ b/xen/arch/x86/acpi/lib.c    2021-09-20 16:25:05.572038000 -0400
@@ -46,10 +46,6 @@
     if ((phys + size) <= (1 * 1024 * 1024))
         return __va(phys);

-    /* No further arch specific implementation after early boot */
-    if (system_state >= SYS_STATE_boot)
-        return NULL;
-
     offset = phys & (PAGE_SIZE - 1);
     mapped_size = PAGE_SIZE - offset;
     set_fixmap(FIX_ACPI_END, phys);
----------------------------------------------------------------------



Further testing with this patch revealed a problem. Although
this simple patch causes dom0 to poweroff when shutting
down, on the next reboot the system dropped to single-user
shell because it mixed up my ssd and my hard disk. Normally
the system assigns my SSD as /dev/sda and my hard disk
as /dev/sdb. But on the first reboot after running the Xen
hypervisor, the system reversed them so my SSD was /dev/sdb
and my hard disk was /dev/sda. Since the EFI partition, which
is a vfat partition, is on the SSD and in /etc/fstab I ask to mount
it from the /dev/sda1 partition, it is now at /dev/sdb1, and
the first partition is not a vfat partition on the hard disk so
the system drops to a root shell for system maintenance.

This switching of the devices on the subsequent reboot is
another symptom of this bug I have seen in the past, and
usually the ordinary behavior is restored on the next reboot
or after resetting and powering off or unplugging from power.
So this patch does not really fix the bug reliably.

To clarify things, I saw this strange behavior of the system
switching the disk devices with this patch under the following
conditions:

1) Boot using this simple patch - dom0 shuts down properly

2) Boot using Elliott's suggested patch in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#94

3) It was when booting using Elliott's suggested patch that
I saw the drop to single-user root for system maintenance.
Moreover, Elliott's suggested patch did not fix the dom0
power off bug.

So it might be the case that this simple patch would work
for both amd64 and arm devices nicely, but Elliott refuses
to test it with his arm devices. Sigh.

Reply via email to