Package: glusterfs-server
Version: 11.1-6
Severity: important
X-Debbugs-Cc: [email protected]
Dear Maintainer,
The debian package of glusterfs server has a patch to change the systemd
service KillMode from "process" to "control-group" (see
https://sources.debian.org/patches/glusterfs/11.2-3/02-systemd-service.diff/)
It looks like this was done because killing glusterd does not kill the
children glusterfsd processes. This is a deliberate design decision in
glusterfs, however, and glusterfsd processes should be left up when
restarting glusterd. The daemon will automatically reconnect by looking
for pidfiles. In fact, killing the entire control group can cause
catastrophic timeouts for clients!
When the glusterfsd processes are sent sigterm, they stop serving I/O
then try to "sign out" by contacting glusterd. But with
KillMode=control-group, glusterd will be dead, so the glusterfsd
processes hang. Crucially, they hang without terminating the socket
connection with a FIN/RST. The glusterfs client is designed to handle
socket terminations, but it depends on the FIN/RST (at which point it
would send the I/O request to a different brick), so clients also hang.
The client has a default network timeout of 42s, longer than the 30s
virtual disk timeout of a debian guest OS on qemu, for example. So if
glusterd gets restarted, the client waits 42s to switch bricks, but by
then, the debian guest already has a disk timeout error, and if the disk
is the root, mounts it read only. This causes all kinds of problems.
Fedora handles this in a better way by leaving glusterd with
KillMode=process, and shipping an additional oneshot service to deal
with glusterfsd processes.
See
https://src.fedoraproject.org/rpms/glusterfs/blob/rawhide/f/glusterfsd.service
I'd suggest doing the same for the upcoming 11.2 release, as the problem
still exists in the 11.2-3 package.
Thanks,
Aram
-- System Information:
Debian Release: 13.5
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500,
'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.17.13-2-pve (SMP w/6 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE,
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE
not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages glusterfs-server depends on:
ii glusterfs-cli 11.1-6
ii glusterfs-client 11.1-6
ii glusterfs-common 11.1-6
Versions of packages glusterfs-server recommends:
ii nfs-common 1:2.8.3-1
glusterfs-server suggests no packages.