Package: mdadm
Version: 4.4-11

We noticed a memory leak in mdmonitor.service on all of our newly installed 
trixie machines.
The worst case machine accumulated about 20GB of leaks in 13 days of uptime.

After some debugging we found the issue.
At mdadm-4.4/udev.c:153 (fetched using apt source):
...
if (udev_monitor_receive_device(udev_monitor))
        return UDEV_STATUS_SUCCESS; /* event detected */
...

According to libudev docs:
On success, udev_monitor_receive_device() returns a pointer to a newly 
referenced device that was received via the monitor. The caller is responsible 
to drop this reference when done.

As you can see, the reference to the device never gets dropped.
We put together a quick patch, which seems to have fixed the issue or at least 
substantially reduced the amount of leaks.

Since we have no experience with the codebase, we have no idea about the 
implications of these changes.
We would appreciate if someone took a closer look.

Additionaly as a workaround, passing MDADM_NO_UDEV=1 env to mdadm stops the 
leaks as well, since it bypasses the leaking codepath.

--- udev.old.c  2025-01-14 13:13:50.000000000 +0100
+++ udev.c      2025-09-17 15:42:15.932836197 +0200
@@ -149,9 +149,13 @@
        tv.tv_sec = seconds;
        tv.tv_usec = 0;

-       if (select(fd + 1, &readfds, NULL, NULL, &tv) > 0 && FD_ISSET(fd, 
&readfds))
-               if (udev_monitor_receive_device(udev_monitor))
+       if (select(fd + 1, &readfds, NULL, NULL, &tv) > 0 && FD_ISSET(fd, 
&readfds)) {
+               struct udev_device *dev = 
udev_monitor_receive_device(udev_monitor);
+               if (dev) {
+                       udev_device_unref(dev);
                        return UDEV_STATUS_SUCCESS; /* event detected */
+               }
+       }
        return UDEV_STATUS_TIMEOUT;
 }
 #endif

Reply via email to