On Mon, 24 Jun 2019 16:41:48 +0200
Frederic Barrat <[email protected]> wrote:

> If an ocxl device is unbound through sysfs at the same time its AFU is
> being opened by a user process, the open code may dereference freed
> stuctures, which can lead to kernel oops messages. You'd have to hit a
> tiny time window, but it's possible. It's fairly easy to test by
> making the time window bigger artificially.
> 
> Fix it with a combination of 2 changes:
> - when an AFU device is found in the IDR by looking for the device
> minor number, we should hold a reference on the device until after the
> context is allocated. A reference on the AFU structure is kept when
> the context is allocated, so we can release the reference on the
> device after the context allocation.
> - with the fix above, there's still another even tinier window,
> between the time the AFU device is found in the IDR and the reference
> on the device is taken. We can fix this one by removing the IDR entry
> earlier, when the device setup is removed, instead of waiting for the
> 'release' device callback. With proper locking around the IDR.
> 
> Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & 
> frontend")
> Signed-off-by: Frederic Barrat <[email protected]>
> ---
> mpe: this fixes a commit merged in v5.2-rc1. It's late, and I don't think 
> it's that important. If it's for the next merge window, I would add:
> Cc: [email protected]      # v5.2
> 
> 
> drivers/misc/ocxl/file.c | 23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index 2870c25da166..4d1b44de1492 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -18,18 +18,15 @@ static struct class *ocxl_class;
>  static struct mutex minors_idr_lock;
>  static struct idr minors_idr;
>  
> -static struct ocxl_file_info *find_file_info(dev_t devno)
> +static struct ocxl_file_info *find_and_get_file_info(dev_t devno)
>  {
>       struct ocxl_file_info *info;
>  
> -     /*
> -      * We don't declare an RCU critical section here, as our AFU
> -      * is protected by a reference counter on the device. By the time the
> -      * info reference is removed from the idr, the ref count of
> -      * the device is already at 0, so no user API will access that AFU and
> -      * this function can't return it.
> -      */
> +     mutex_lock(&minors_idr_lock);
>       info = idr_find(&minors_idr, MINOR(devno));
> +     if (info)
> +             get_device(&info->dev);
> +     mutex_unlock(&minors_idr_lock);
>       return info;
>  }
>  
> @@ -58,14 +55,16 @@ static int afu_open(struct inode *inode, struct file 
> *file)
>  
>       pr_debug("%s for device %x\n", __func__, inode->i_rdev);
>  
> -     info = find_file_info(inode->i_rdev);
> +     info = find_and_get_file_info(inode->i_rdev);
>       if (!info)
>               return -ENODEV;
>  
>       rc = ocxl_context_alloc(&ctx, info->afu, inode->i_mapping);
> -     if (rc)
> +     if (rc) {
> +             put_device(&info->dev);

You could have a single call site for put_device() since it's
needed for both branches. No big deal.

>               return rc;
> -
> +     }
> +     put_device(&info->dev);
>       file->private_data = ctx;
>       return 0;
>  }
> @@ -487,7 +486,6 @@ static void info_release(struct device *dev)
>  {
>       struct ocxl_file_info *info = container_of(dev, struct ocxl_file_info, 
> dev);
>  
> -     free_minor(info);
>       ocxl_afu_put(info->afu);
>       kfree(info);
>  }
> @@ -577,6 +575,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu)
>  
>       ocxl_file_make_invisible(info);
>       ocxl_sysfs_unregister_afu(info);
> +     free_minor(info);

Since the IDR entry is added by ocxl_file_register_afu(), it seems to make
sense to undo that in ocxl_file_unregister_afu(). Out of curiosity, was there
any historical reason to do this in info_release() in the first place ?

Reviewed-by: Greg Kurz <[email protected]>

>       device_unregister(&info->dev);
>  }
>  

Reply via email to