On Fri, May 3, 2024 at 6:40 AM Charan Teja Kalla
<quic_chara...@quicinc.com> wrote:
>
> Thanks Christian/TJ for the inputs!!
>
> On 4/18/2024 12:16 PM, Christian König wrote:
> > As far as I can see the EPOLL holds a reference to the files it
> > contains. So it is perfectly valid to add the file descriptor to EPOLL
> > and then close it.
> >
> > In this case the file won't be closed, but be kept alive by it's
> > reference in the EPOLL file descriptor.
>
> I am not seeing that adding a 'fd' into the epoll monitoring list will
> increase its refcount.  Confirmed by writing a testcase that just do
> open + EPOLL_CTL_ADD and see the /proc/../fdinfo of epoll fd(Added
> file_count() info to the output)
> # cat /proc/136/fdinfo/3
> pos:    0
> flags:  02
> mnt_id: 14
> ino:    1041
> tfd:        4 events:       19 data:                4  pos:0 ino:81 sdev:5
> refcount: 1<-- The fd added to the epoll monitoring is still 1(same as
> open file refcount)
>
> From the code too, I don't see a file added in the epoll monitoring list
> will have an extra refcount but momentarily (where it increases the
> refcount of target file, add it to the rbtree of the epoll context and
> then decreasing the file refcount):
> do_epoll_ctl():
>         f = fdget(epfd);
>         tf = fdget(fd);
>         epoll_mutex_lock(&ep->mtx)
>         EPOLL_CTL_ADD:
>                 ep_insert(ep, epds, tf.file, fd, full_check); // Added to the 
> epoll
> monitoring rb tree list as epitem.
>         mutex_unlock(&ep->mtx);
>         fdput(tf);
>         fdput(f);
>
>
> Not sure If i am completely mistaken what you're saying here.
>
> > The fs layer which calls dma_buf_poll() should make sure that the file
> > can't go away until the function returns.
> >
> > Then inside dma_buf_poll() we add another reference to the file while
> > installing the callback:
> >
> >                         /* Paired with fput in dma_buf_poll_cb */
> >                         get_file(dmabuf->file); No, exactly that can't
> > happen either.
> >
>
> I am not quite comfortable with epoll internals but I think below race
> is possible where "The 'file' passed to dma_buf_poll() is proper but
> ->f_count == 0, which is indicating that a parallel freeing is
> happening". So, we should check the file->f_count value before taking
> the refcount.
>
> (A 'fd' registered for the epoll monitoring list is maintained as
> 'epitem(epi)' in the rbtree of 'eventpoll(ep, called as epoll context).
>
> epoll_wait()                                __fput()(as file->f_count=0)
> ------------------------------------------------------------------------
> a) ep_poll_callback():
>      Is the waitqueue function
>    called when signaled on the
>    wait_queue_head_t of the registered
>    poll() function.
>
>    1) It links the 'epi' to the ready list
>       of 'ep':
>        if (!ep_is_linked(epi))
>          list_add_tail_lockless(&epi->rdllink,
>                 &ep->rdllist)
>
>    2) wake_up(&ep->wq);
>         Wake up the process waiting
>         on epoll_wait() that endup
>         in ep_poll.
>
>
> b) ep_poll():
>     1) while (1) {
>         eavail = ep_events_available(ep);
>         (checks ep->rdlist)
>         ep_send_events(ep);
>         (notify the events to user)
>     }
>     (epoll_wait() calling process gets
>      woken up from a.2 and process the
>      events raised added to rdlist in a.1)
>
>    2) ep_send_events():
>         mutex_lock(&ep->mtx);
>         (** The sync lock is taken **)
>         list_for_each_entry_safe(epi, tmp,
>                         &txlist, rdllink) {
>             list_del_init(&epi->rdllink);
>             revents = ep_item_poll(epi, &pt, 1)
>             (vfs_poll()-->...f_op->poll(=dma_buf_poll)
>         }
>         mutex_unlock(&ep->mtx)
>         (**release the lock.**)
>
>         (As part of procession of events,
>          each epitem is removed from rdlist
>          and call the f_op->poll() of a file
>          which will endup in dma_buf_poll())
>
>    3) dma_buf_poll():
>         get_file(dmabuf->file);
>         (** f_count changed from 0->1 **)
>         dma_buf_poll_add_cb(resv, true, dcb):
>         (registers dma_buf_poll_cb() against fence)
>
>
>                                 c) eventpoll_release_file():
>                                    mutex_lock(&ep->mtx);
>                                    __ep_remove(ep, epi, true):
>                                    mutex_unlock(&ep->mtx);
>                                   (__ep_remove() will remove the
>                                    'epi' from rbtree and if present
>                                    from rdlist as well)
>
>                                 d) file_free(file), free the 'file'.
>
> e) dma_buf_poll_cb:
>  /* Paired with get_file in dma_buf_poll */
>  fput(dmabuf->file);
>  (f_count changed from 1->0, where
>   we try to free the 'file' again
>   which is UAF/double free).
>
>
>
> In the above race, If c) gets called first, then the 'epi' is removed
> from both rbtree and 'rdlink' under ep->mtx lock thus b.2 don't end up
> in calling the ->poll() as it don't see this event in the rdlist.
>
> Race only exist If b.2 executes first, where it will call dma_buf_poll
> with __valid 'struct file' under ep->mtx but its refcount is already
> could have been zero__. Later When e) is executed, it turns into double
> free of the 'file' structure.
>
> If you're convinced with the above race, should the fix here will be
> this simple check:
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 8fe5aa67b167..e469dd9288cc
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -240,6 +240,10 @@ static __poll_t dma_buf_poll(struct file *file,
> poll_table *poll)
>         struct dma_resv *resv;
>         __poll_t events;
>
> +       /* Check parallel freeing of file */
> +       if (!file_count(file))
> +               return 0;
> +
>
> Thanks,
> Charan

Hi Charan,

It looks like a similar conclusion about epoll was reached at:
https://lore.kernel.org/all/a87d7ef8-2c59-4dc5-ba0a-b821d1eff...@amd.com/

I agree with Christian that it should not be possible for the file to
be freed while inside dma_buf_poll. Aside from causing problems in
dma_buf_poll, ep_item_poll itself would have issues dereferencing the
freed file pointer.

Christian's patch there makes me wonder about what the epoll man page
says about closing files.
"Will closing a file descriptor cause it to be removed from all epoll
interest lists?" Answer: Yes
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man7/epoll.7#n442

It looks like eventpoll_release_file is responsible for that, but if
the epitem is changed to hold a reference then that can't be true
anymore because __fput will never call eventpoll_release_file (until
EPOLL_CTL_DEL). But how do you call EPOLL_CTL_DEL if you've closed all
your references to the file in userspace? So I think epoll needs a
slightly more complicated fix.

Reply via email to