On 3/2/26 9:09 PM, John Garry wrote:
On 02/03/2026 12:31, Nilay Shroff wrote:

+#define MPATH_HEAD_DISK_LIVE             0
+
  struct mpath_head {
      struct srcu_struct    srcu;
      struct list_head    dev_list;    /* list of all mpath_devs */
@@ -17,12 +34,36 @@ struct mpath_head {
      struct kref        ref;
+    unsigned long        flags;
      struct mpath_device __rcu         *current_path[MAX_NUMNODES];
+    const struct mpath_head_template    *mpdt;
      void            *drvdata;
  };
Not sure why we don't have back reference to struct mpath_disk
from struct mpath_head here. Does it make sense to have this?

We can get away without it.

Some more background info .. so the concept of separate mpath_head and mpath_disk is driven by SCSI, which has scsi_device and scsi_disk classes. The scsi_disk driver (sd.c) controls the per-path gendisk and the mpath_disk, and these internals are hidden from the scsi_core (which controls the scsi_device). SCSI having this layered approach makes things more complicated. This is unlike NVMe, where the core driver controls the NS gendisk also.



+static inline struct mpath_disk *mpath_bd_device_to_disk(struct device *dev)
+{
+    return dev_get_drvdata(dev);
+}
+
+static inline struct mpath_disk *mpath_gendisk_to_disk(struct gendisk *disk)
+{
+    return mpath_bd_device_to_disk(disk_to_dev(disk));
+}
+
  int mpath_get_head(struct mpath_head *mpath_head);
  void mpath_put_head(struct mpath_head *mpath_head);
  struct mpath_head *mpath_alloc_head(void);
+void mpath_put_disk(struct mpath_disk *mpath_disk);
+void mpath_remove_disk(struct mpath_disk *mpath_disk);
+void mpath_unregister_disk(struct mpath_disk *mpath_disk);
+struct mpath_disk *mpath_alloc_head_disk(struct queue_limits *lim,
+            int numa_node);
+void mpath_device_set_live(struct mpath_disk *mpath_disk,
+            struct mpath_device *mpath_device);
+void mpath_unregister_disk(struct mpath_disk *mpath_disk);
+static inline bool is_mpath_head(struct gendisk *disk)
+{
+    return disk->fops == &mpath_ops;
+}
  #endif // _LIBMULTIPATH_H
diff --git a/lib/multipath.c b/lib/multipath.c
index 15c495675d729..88efb0ae16acb 100644
--- a/lib/multipath.c
+++ b/lib/multipath.c
@@ -32,6 +32,135 @@ void mpath_put_head(struct mpath_head *mpath_head)
  }
  EXPORT_SYMBOL_GPL(mpath_put_head);
+static void mpath_free_disk(struct kref *ref)
+{
+    struct mpath_disk *mpath_disk =
+        container_of(ref, struct mpath_disk, ref);
+    struct mpath_head *mpath_head = mpath_disk->mpath_head;
+
+    put_disk(mpath_disk->disk);
+    mpath_put_head(mpath_head);
+    kfree(mpath_disk);
+}
+

The mpath_alloc_head_disk() doesn't get a reference to the
mpath_head object but here while freeing mpath_disk we put
the reference to mpath_head. Would that create a reference
imbalance?

I think that what I done can be improved. If you check nvme_mpath_alloc_disk(), when we alloc the head the ref is 1, and then we rely on the disk release to release that head reference.


Yes we got a reference to mpath_head while
allocating it but then these are two (alloc mpath_disk and
alloc mpath_head) disjoint operations. In that case, can't
we have both mpath_disk and mpath_head allocated under one
libmultipath API?

I would like to have something simpler (like mainline NVMe code), but I have it this way because of SCSI, as above.

I understand the intended lifetime model due to SCSI, but the current flow is somewhat confusing.

In nvme_mpath_alloc_disk(), mpath_disk and mpath_head are allocated
separately. However, during teardown, both objects are ultimately
released through mpath_free_disk(), which drops the reference to
mpath_head via mpath_put_head().

Since the allocation of mpath_disk and mpath_head happens independently,
it is not immediately obvious why their lifetime is tied together and
why they are not freed independently when the NVMe head node is removed.
This coupling makes the ownership and reference flow harder to reason
about.

Additionally, I noticed that nvme_remove_head() has been removed in the
NVMe code that integrates with libmultipath. IMO, It might be clearer to
retain this function and make the teardown sequence explicit (after
removing mpath_put_head() from mpath_free_disk()).
For example:

nvme_remove_head():
    mpath_unregister_disk();  /* removes mpath_disk and drops its ref */
    mpath_put_head();         /* drops mpath_head reference */
    nvme_put_ns_head();       /* drops NVMe namespace head reference */

Does the above example makes sense?

Thanks,
--Nilay

Reply via email to