On Tue, Aug 12, 2025 at 11:17:04AM +0100, Marc Zyngier wrote:
On Tue, 12 Aug 2025 11:09:12 +0100,
Coiby Xu <c...@redhat.com> wrote:

On Mon, Aug 11, 2025 at 03:52:04PM +0100, Marc Zyngier wrote:
> On Mon, 11 Aug 2025 14:03:21 +0100,
> Thomas Gleixner <t...@linutronix.de> wrote:
>>
>> On Mon, Aug 11 2025 at 15:02, Thomas Gleixner wrote:
>>
>> CC+ Marc
>>
>> > On Mon, Aug 11 2025 at 11:23, Coiby Xu wrote:
>> >> Recently I met an issue that on certain virtual machines, the kdump
>> >> kernel fails to get DHCP IP address most of times starting from
>> >> 6.11-rc2. git bisection shows commit b5712bf89b4b ("irqchip/gic-v3-its:
>> >> Provide MSI parent for PCI/MSI[-X]") is the 1st bad commit,
>> >>
>> >>      # good: [7d189c77106ed6df09829f7a419e35ada67b2bd0] PCI/MSI: Provide
>> >>      # MSI_FLAG_PCI_MSI_MASK_PARENT
>> >>      git bisect good 7d189c77106ed6df09829f7a419e35ada67b2bd0
>> >>      # good: [48f71d56e2b87839052d2a2ec32fc97a79c3e264] 
irqchip/gic-v3-its:
>> >>      # Provide MSI parent infrastructure
>> >>      git bisect good 48f71d56e2b87839052d2a2ec32fc97a79c3e264
>> >>      # good: [8c41ccec839c622b2d1be769a95405e4e9a4cb20] 
irqchip/irq-msi-lib:
>> >>      # Prepare for PCI MSI/MSIX
>> >>      git bisect good 8c41ccec839c622b2d1be769a95405e4e9a4cb20
>> >>      # first bad commit: [b5712bf89b4bbc5bcc9ebde8753ad222f1f68296]
>> >>      # irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
>> >
>> > There were follow up fixes on this, so isolating this one is not really
>> > conclusive.
>> >
>> > Is the problem still there on v6.16 and v6.17-rc1?
>
> Yeah, there are way too many things that have been addressed since.
> kdump is also a particularly nasty case, as it tends to rely on the
> redistributor tables programmed by the previous kernel.

Thanks for providing a clue. This may also explain explain why I fails
to reproduce this issue against 1st kernel even with the same cmdline of
the kdump kernel.

I'm not sure that's a clue. It's only an indication that things are
not necessarily easy to spot.

Has it ever been reproduced on bare metal? Have you tried v6.16 as
instructed?

Thanks for replying so quickly!

No, I haven't reproduced it on a bare metal machine and our QE engineers
haven't noticed this issue on any bare metal machine either.
And I can confirm this issue still happens to 6.16.0-200.fc42.aarch64
and 6.17.0-0.rc1.17.fc43.aarch64 on the type of KVM VMS (QEMU PnP device
PNP0c02) where the issue was found.



>
> Also, this says "virtual machines". What's the hypervisor?

I'll contact the lab administrator. What kinds of info I should collect
to help you narrow down the issue?

Surely you know what hypervisor you're running on, right?

Yes, the hypervisor is KVM. Sorry, I thought merely providing the
hypervisor info isn't sufficient and also misunderstood your request as
providing more details on the host machine.



> How hard is it to reproduce?

It can be reproduced reliably on certain machines. But as of writing I
haven't reproduced it on other KVM virtual machines on three different
host machines.

Which machines? I'm sorry, but if you want help on this, you'll have
to provide actual information.

Sorry, I didn't mean to be vague. I thought you question is on how
reproducible this issue is and there is no need to provide the details
on the machines where I can't reproduce this issue. Since you explicitly
request it, I'll be glad to share the details.

I just grabbed three arbitrary bare metal machines having Fedora-42
installed and launched some KVM VMs to see if this issue can be
reproduced easily. Two host machines are as follows (sorry I can't find
the info of the 3rd one)
- GIGABYTE PnP device PNP0c02, ARMv8 (M128-30)
- LTHPCSR112 (01234567890123456789AB), ARMv8 (Q80-30)

The virtual machine image is downloaded from
https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/aarch64/images/Fedora-Cloud-Base-Generic-42-1.1.aarch64.qcow2.
I tried different vCPUs (2, 4), different RAM (4G, 35G) and also two
different UEFI firmware (the default one and one from edk2-experimental
package) but haven't reproduced this issue so far.


Thanks,

        M.

--
Without deviation from the norm, progress is not possible.


--
Best regards,
Coiby


Reply via email to