On 9/19/2021 10:56 AM, Elliott Mitchell wrote:
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso
<car...@debian.org> wrote:
  >
  > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
  > > An experiment lead to a potential alternative explanation for #991967.
  > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
  > > 4.19.194-3. Presence of Xen on the system may be unrelated.
  > >
  > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
  > > was tried on a UEFI system and the issue wasn't observed)
  >
  > Following up on https://bugs.debian.org/991967#12
  >
  > Did you succeeded in bisecting the issue as you seem to have it
  > reproducible?

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)
Nope.  As per the report the problem appeared with kernel 4.19.194-3 and
at the time using Xen 4.11.

The kernel you're listing is rather more recent, which might suggest a
patch which had been backported from 5.x to 4.19.

I could believe a Xen security update being the trigger though (I don't
recall there being one at the right time, but I wouldn't rule it out).


Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.
Just to make sure, the kernel you were testing was 4.19.194-3?  The
issue didn't manifest with kernels earlier than that.

I will check again with a buster dom0 when I get a chance,
probably late tonight or tomorrow. I think it was 4.19.194-3
if that is the latest buster kernel because I don't think there
has been an update to the buster kernel since I tested it.

Could be we're seeing distinct bugs.

I could agree if the problem shows up on my system
with the 4.19.194-3 kernel dom0 on xen-4.11, but if not,
then it is probably the same bug, a bug that is in src:xen,
not src:linux.


This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.
While that commit modifies the code path the processor takes, the
modified path appears identical.


I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch upstream
should not be included in the xen package for Debian stable.
Some people are asking for those.  Those are bugfixes for an extremely
popular device which panics on boot without the patches.

The raspberry pi, I presume.


Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees
were modified in a way which broke Xen 4.14 on ARM64.  The change
violated Linux's own standards for device-trees, yet still appeared in a
stable branch.

In other news, if you see device-trees compared to ACPI tables, they're
not very comparable.  99% of ACPI tables work for all versions of all
OSes.  Any given device-tree is only likely to work for a single version
of a single OS.  While a useful abstraction for portions of kernel code,
device-trees are utter garbage compared to ACPI tables.



Well, now we are at Debian stable with 5.10.x for linux and 4.14.x for xen,
so we are kind of stuck with these versions on Debian stable now. I am all
for tweaking the Debian stable packages to support raspberry and amd64. The
question is, what is the quickest and least disturbing way to fix it now?

All the best,

Chuck

Reply via email to