On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
> -int kernel_read_file(struct file *file, void **buf, loff_t *size,
> -                  loff_t max_size, enum kernel_read_file_id id)
> -{
> -     loff_t i_size, pos;
> +int kernel_pread_file(struct file *file, void **buf, loff_t *size,
> +                   loff_t pos, loff_t max_size,
> +                   enum kernel_pread_opt opt,
> +                   enum kernel_read_file_id id)
> +{
> +     loff_t alloc_size;
> +     loff_t buf_pos;
> +     loff_t read_end;
> +     loff_t i_size;
>       ssize_t bytes = 0;
>       int ret;
>  

Look, it's not your fault, but this is a great example of how we end
up with atrocious interfaces.  Someone comes along and implements a
simple DWIM interface that solves their problem.  Then somebody else
adds a slight variant that solves their problem, and so on and so on,
and we end up with this bonkers API where the arguments literally change
meaning depending on other arguments.

> @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, 
> loff_t *size,
>               ret = -EINVAL;
>               goto out;
>       }
> -     if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> +
> +     /* Default read to end of file */
> +     read_end = i_size;
> +
> +     /* Allow reading partial portion of file */
> +     if ((opt == KERNEL_PREAD_PART) &&
> +         (i_size > (pos + max_size)))
> +             read_end = pos + max_size;
> +
> +     alloc_size = read_end - pos;
> +     if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) {
>               ret = -EFBIG;
>               goto out;

... like that.

I think what we actually want is:

ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp);
void vunmap_file_range(struct file *, void *buf);

If end > i_size, limit the allocation to i_size.  Returns the number
of bytes allocated, or a negative errno.  Writes the pointer allocated
to *bufp.  Internally, it should use the page cache to read in the pages
(taking appropriate reference counts).  Then it maps them using vmap()
instead of copying them to a private vmalloc() array.

kernel_read_file() can be converted to use this API.  The users will
need to be changed to call kernel_read_end(struct file *file, void *buf)
instead of vfree() so it can call allow_write_access() for them.

vmap_file_range() has a lot of potential uses.  I'm surprised we don't
have it already, to be honest.

Reply via email to