Package: glusterfs-server
Version: 11.1-6
Severity: important
X-Debbugs-Cc: [email protected]

Dear Maintainer,

The debian package of glusterfs server has a patch to change the systemd service KillMode from "process" to "control-group" (see https://sources.debian.org/patches/glusterfs/11.2-3/02-systemd-service.diff/)

It looks like this was done because killing glusterd does not kill the children glusterfsd processes. This is a deliberate design decision in glusterfs, however, and glusterfsd processes should be left up when restarting glusterd. The daemon will automatically reconnect by looking for pidfiles. In fact, killing the entire control group can cause catastrophic timeouts for clients!

When the glusterfsd processes are sent sigterm, they stop serving I/O then try to "sign out" by contacting glusterd. But with KillMode=control-group, glusterd will be dead, so the glusterfsd processes hang. Crucially, they hang without terminating the socket connection with a FIN/RST. The glusterfs client is designed to handle socket terminations, but it depends on the FIN/RST (at which point it would send the I/O request to a different brick), so clients also hang. The client has a default network timeout of 42s, longer than the 30s virtual disk timeout of a debian guest OS on qemu, for example. So if glusterd gets restarted, the client waits 42s to switch bricks, but by then, the debian guest already has a disk timeout error, and if the disk is the root, mounts it read only. This causes all kinds of problems.

Fedora handles this in a better way by leaving glusterd with KillMode=process, and shipping an additional oneshot service to deal with glusterfsd processes. See https://src.fedoraproject.org/rpms/glusterfs/blob/rawhide/f/glusterfsd.service

I'd suggest doing the same for the upcoming 11.2 release, as the problem still exists in the 11.2-3 package.

Thanks,

Aram

-- System Information:
Debian Release: 13.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.17.13-2-pve (SMP w/6 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages glusterfs-server depends on:
ii  glusterfs-cli     11.1-6
ii  glusterfs-client  11.1-6
ii  glusterfs-common  11.1-6

Versions of packages glusterfs-server recommends:
ii  nfs-common  1:2.8.3-1

glusterfs-server suggests no packages.

Reply via email to