Your message dated Sat, 10 Aug 2024 14:20:05 +0200 (CEST)
with message-id <[email protected]>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #1037223,
regarding Possible bug causing I/O hangs
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
1037223: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1037223
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: linux-image-amd64
Version: 5.10.178-3


Hi all,

I do not usually report kernel bugs so hopefully this is the right place!

We recently updated the kernel of our Debian 11 servers and since then we have 
encountered a bunch of servers (both VMs and bare metal) that suffer I/O 
hanging issues.
We can access the server through a console where I cannot copy text, but I have 
attached a screenshot showing the message we see in dmesg.

We initially thought this was related to the ext4 fast_commit feature flag we 
have enabled, and we do feel the issue occurs less often with fast_commit 
disabled, but it does not appear to be solved completely when we disable this 
feature.

With this error, we've been googling a bit and I ended up on this thread: 
https://www.spinics.net/lists/linux-ext4/msg86261.html through initially 
https://github.com/flatcar/Flatcar/issues/847
It mentions this fix: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y&id=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf
I believe this fix is currently not present in the 5.10 kernel available for 
Debian 11.

However, the linked fix also mentions:
> This bug has been around for many years, but it became *much* easier
to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr
blocks").

Looking at the changelog: 
https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog
We do see the "ext4: fix race when reusing xattr blocks" change being added in 
5.10.178-1.
This is why we believe we are now hitting this bug.

My question is whether this seems plausible, and if so, whether the fix I 
linked can also be released for Debian 11?

We could also upgrade to the bullseye-backports kernel, but given that this 
issue makes the system essentially unusable and we hit it every few days on one 
of our servers it may be more widespread and worth it to fix it in the regular 
bullseye kernel as well.

Thank you!
Best regards,


Niels Hendriks

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply via email to