[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2024-02-01 Thread Christian Rohmann
Thx a log Heitor! With no mention of some new package fixing this I did not correlate that to any patch to the kernel. Will the be fixed in the HWE kernel as well then? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2024-01-31 Thread Christian Rohmann
@Robert thanks for keeping this bug alive and updated! 1) More debug info required? @Robert, reading your post https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/comments/50 again, I am wondering if you asked me to provided more debug info with NVM 4.4 on my E810 NICs? Would this

[Kernel-packages] [Bug 2050032] Re: mpt3sas causes kernel stack trace

2024-01-25 Thread Christian Rohmann
We see the same issue with lots of references to arrays within mpt3sas_scsih.c: ``` UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.5-q7NZ0T/linux-hwe-6.5-6.5.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:4667:12 UBSAN: array-index-out-of-bounds in

[Kernel-packages] [Bug 2051232] [NEW] kernel: BUG: Bad page state in process kworker

2024-01-25 Thread Christian Rohmann
Public bug reported: Similar to the bug https://bugs.launchpad.net/ubuntu/+source/linux- hwe-6.5/+bug/2051123 where traces were shown, we observed a "BUG" being reported on yet another machine of the same make / model (Asus RS720A-E11-RS24U using dual socket AMD EPYC Milan CPUs): ``` [...] Jan

[Kernel-packages] [Bug 2051123] Re: Kernel traces leading to crash - refcount_t: underflow; use-after-free and refcount_t: saturated; leaking memory -- lib/refcount.c

2024-01-25 Thread Christian Rohmann
** Attachment added: "lspci output of the machine type showing the traces" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.5/+bug/2051123/+attachment/5742209/+files/lspci.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to

[Kernel-packages] [Bug 2051123] Re: Kernel traces leading to crash - refcount_t: underflow; use-after-free and refcount_t: saturated; leaking memory -- lib/refcount.c

2024-01-25 Thread Christian Rohmann
We just observed this issue on another machine of the same make and model. Kernel log of the boot up to the crash is attached. This machine had NO virtual machines running though. We saw side effects such as hanging processes but were able to log in and reboot the machine. ** Summary changed:

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2024-01-25 Thread Christian Rohmann
@Stefan Could you kindly elaborate on the "Fix Commmited"? Was there any change to the kernel that would fix this issue? Is this fixed with 4.40 NVM from Intel? Reading Roberts post (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/comments/50) again, it seems that he is only guessing

[Kernel-packages] [Bug 2051123] [NEW] Kernel traces and crash on on KVM hypervisor - refcount_t: underflow; use-after-free and refcount_t: saturated; leaking memory -- lib/refcount.c

2024-01-24 Thread Christian Rohmann
Public bug reported: A few hours after upgrading a machine serving as VM hypervisor running OpenStack Nova + libvirt from linux kernel 6.2.0-37-generic to 6.5.0-14-generic we observed kernel traces and quick disintegration of the system and its various processes. While the TCP connection itself

[Kernel-packages] [Bug 2051114] [NEW] Kernel trace in arch/x86/kvm/mmu/mmu.c:6362 during KVM live migration

2024-01-24 Thread Christian Rohmann
Public bug reported: We observed a kernel trace on a KVM hypervisor servers during live migrating an instance: ``` [...] Jan 23 10:58:53 fra-az1-comp-22 kernel: [ cut here ] Jan 23 10:58:53 fra-az1-comp-22 kernel: WARNING: CPU: 75 PID: 1082578 at

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2024-01-04 Thread Christian Rohmann
@Robert, first thanks a lot for pursuing this issue! 1) I certainly can provide the debugging info. May I ask if ... a) the system in question would need to have an active LAG (LACP) for this to be helpful? We did switch to active-backup on all our machines due to this very issue. b) this

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2023-12-07 Thread Christian Rohmann
FWIW, we updated our NICs to 4.30 as they were individually purchased and not part of pre-built servers and also have this issue. So in essence the issue also exists with the latest firmware. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2023-11-29 Thread Christian Rohmann
1) Andre, after I switched to active-backup the issue is gone (so far). But yeah, we are looking for a reproducer as well. It's hard to narrow down some random issue - also likely for Intel. 2) But I just received an email from an Intel developer with a suggested change to the driver to narrow

[Kernel-packages] [Bug 2036239] Re: Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out

2023-11-21 Thread Christian Rohmann
I ran into this issue on 22.04 LTS (using HWE kernel 6.2) on a 100G dual-port E810 NIC. Also with LACP only, active-backup works without issues. To bring this more to the attention of the driver devs, I posted to the intel-wired-lan ML: https://lists.osuosl.org/pipermail/intel-wired-