Package: mdadm
Version: 4.4-11
We noticed a memory leak in mdmonitor.service on all of our newly installed
trixie machines.
The worst case machine accumulated about 20GB of leaks in 13 days of uptime.
After some debugging we found the issue.
At mdadm-4.4/udev.c:153 (fetched using apt source):
...
if (udev_monitor_receive_device(udev_monitor))
return UDEV_STATUS_SUCCESS; /* event detected */
...
According to libudev docs:
On success, udev_monitor_receive_device() returns a pointer to a newly
referenced device that was received via the monitor. The caller is responsible
to drop this reference when done.
As you can see, the reference to the device never gets dropped.
We put together a quick patch, which seems to have fixed the issue or at least
substantially reduced the amount of leaks.
Since we have no experience with the codebase, we have no idea about the
implications of these changes.
We would appreciate if someone took a closer look.
Additionaly as a workaround, passing MDADM_NO_UDEV=1 env to mdadm stops the
leaks as well, since it bypasses the leaking codepath.
--- udev.old.c 2025-01-14 13:13:50.000000000 +0100
+++ udev.c 2025-09-17 15:42:15.932836197 +0200
@@ -149,9 +149,13 @@
tv.tv_sec = seconds;
tv.tv_usec = 0;
- if (select(fd + 1, &readfds, NULL, NULL, &tv) > 0 && FD_ISSET(fd,
&readfds))
- if (udev_monitor_receive_device(udev_monitor))
+ if (select(fd + 1, &readfds, NULL, NULL, &tv) > 0 && FD_ISSET(fd,
&readfds)) {
+ struct udev_device *dev =
udev_monitor_receive_device(udev_monitor);
+ if (dev) {
+ udev_device_unref(dev);
return UDEV_STATUS_SUCCESS; /* event detected */
+ }
+ }
return UDEV_STATUS_TIMEOUT;
}
#endif