On Wed, Nov 27, 2024 at 10:48:03PM +0100, Jann Horn wrote:
> On Wed, Nov 27, 2024 at 10:36 PM Kent Overstreet
> <[email protected]> wrote:
> > When invoked from aio, mm_struct is guaranteed to outlive the request
> > since its lifetime is tied to the io_context - but that's not the case
> > for io_uring, it's possible that a process could be killed and mm_struct
> > goes away while a request is in flight.
> >
> > So if we're submitting the rest of the io asynchronously, we may need a
> > ref on mm_struct.
> >
> > Per Jens, this is not actually a bug because we're not yet flipping on
> > FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker
> > kthread - but this patch is necessary for safely flipping on
> > FMODE_NOWAIT for more efficient submissions in the future.
> 
> Ah, one thing to look out for is that mm_struct has two types of
> refcounting, this patch mixes them up - I don't think my explanation
> of that was very clear:
> 
>  - mmgrab()/mmdrop() prevent the mm_struct itself being freed, but
> don't properly keep the mappings in it alive
>  - mmget()/mmput() fully keep the MM alive
> 
> An mmget() reference automatically also holds an mmgrab() reference, 
> basically.
> 
> For kthread_use_mm(), you need an mmget() reference; so the most
> straightforward option would be to mmget() when creating the
> asynchronous work and mmput() when you're done. But that has the
> disadvantage that the reference keeps the entire mm_struct with all
> the mappings inside it alive, and it even indirectly holds a reference
> to all the files mapped in the MM; though I don't know if that's
> really a problem here.
> 
> If you want to avoid keeping the MM of an exiting process alive while
> the I/O is running, the neater pattern is to use mmgrab()/mmdrop() for
> your long-term reference, and then use mmget_not_zero()/mmput() around
> kthread_use_mm()/kthread_unuse_mm() (and bail if mmget_not_zero()
> fails). You can see this pattern in places like
> vfio_iommu_type1_dma_rw_chunk().

gotcha...

>From 3ef3067690d838dfc9a12686caa015e85e217f01 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <[email protected]>
Date: Wed, 27 Nov 2024 16:32:26 -0500
Subject: [PATCH] bcachefs: dio write: Take ref on mm_struct when using
 asynchronously

When invoked from aio, mm_struct is guaranteed to outlive the request
since its lifetime is tied to the io_context - but that's not the case
for io_uring, it's possible that a process could be killed and mm_struct
goes away while a request is in flight.

So if we're submitting the rest of the io asynchronously, we may need a
ref on mm_struct.

Per Jens, this is not actually a bug because we're not yet flipping on
FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker
kthread - but this patch is necessary for safely flipping on
FMODE_NOWAIT for more efficient submissions in the future.

Reported-by: Jann Horn <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>

diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
index 2089c36b5866..8da0577411e8 100644
--- a/fs/bcachefs/fs-io-direct.c
+++ b/fs/bcachefs/fs-io-direct.c
@@ -226,6 +226,7 @@ struct dio_write {
        struct mm_struct                *mm;
        const struct iovec              *iov;
        unsigned                        loop:1,
+                                       have_mm_ref:1,
                                        extending:1,
                                        sync:1,
                                        flush:1;
@@ -390,6 +391,9 @@ static __always_inline long bch2_dio_write_done(struct 
dio_write *dio)
 
        kfree(dio->iov);
 
+       if (dio->have_mm_ref)
+               mmdrop(dio->mm);
+
        ret = dio->op.error ?: ((long) dio->written << 9);
        bio_put(&dio->op.wbio.bio);
 
@@ -529,9 +533,24 @@ static __always_inline long bch2_dio_write_loop(struct 
dio_write *dio)
 
                if (unlikely(dio->iter.count) &&
                    !dio->sync &&
-                   !dio->loop &&
-                   bch2_dio_write_copy_iov(dio))
-                       dio->sync = sync = true;
+                   !dio->loop) {
+                       /*
+                        * Rest of write will be submitted asynchronously -
+                        * unless copying the iov fails:
+                        */
+                       if (likely(!bch2_dio_write_copy_iov(dio))) {
+                               /*
+                                * aio guarantees that mm_struct outlives the
+                                * request, but io_uring does not
+                                */
+                               if (dio->mm) {
+                                       mmgrab(dio->mm);
+                                       dio->have_mm_ref = true;
+                               }
+                       } else {
+                               dio->sync = sync = true;
+                       }
+               }
 
                dio->loop = true;
                closure_call(&dio->op.cl, bch2_write, NULL, NULL);
@@ -559,15 +578,25 @@ static __always_inline long bch2_dio_write_loop(struct 
dio_write *dio)
 
 static noinline __cold void bch2_dio_write_continue(struct dio_write *dio)
 {
-       struct mm_struct *mm = dio->mm;
+       struct mm_struct *mm = dio->have_mm_ref ? dio->mm: NULL;
 
        bio_reset(&dio->op.wbio.bio, NULL, REQ_OP_WRITE);
 
-       if (mm)
+       if (mm) {
+               if (unlikely(!mmget_not_zero(mm))) {
+                       /* process exited */
+                       dio->op.error = -ESRCH;
+                       bch2_dio_write_done(dio);
+                       return;
+               }
+
                kthread_use_mm(mm);
+       }
        bch2_dio_write_loop(dio);
-       if (mm)
+       if (mm) {
+               mmput(mm);
                kthread_unuse_mm(mm);
+       }
 }
 
 static void bch2_dio_write_loop_async(struct bch_write_op *op)

Reply via email to