Hello, Den.
Who could help with preparing RK for this patch ?
________________________________
От: Alexey Kuznetsov <kuz...@virtuozzo.com>
Отправлено: 8 мая 2025 г. 14:15
Кому: Kui Liu <kui....@virtuozzo.com>
Копия: devel@openvz.org <devel@openvz.org>; Andrey Zaitsev 
<azait...@virtuozzo.com>; Konstantin Khorenko <khore...@virtuozzo.com>
Тема: Re: [PATCH VZ9] fs/fuse kio: fix hang rpc over rdma io

Hello!

Can we make this to ready-kernel patch for 5.14.0-427.33.1.vz9.72.5?
I promised customer in https://virtuozzo.atlassian.net/browse/ASUP-1425
to repair this issue without reboot.

On Thu, May 8, 2025 at 7:11 PM Alexey Kuznetsov <kuz...@virtuozzo.com> wrote:
>
> Ack
>
> On Thu, May 8, 2025 at 12:26 PM Liu Kui <kui....@virtuozzo.com> wrote:
> >
> > When a large msg is being sent out over rdma and in the stage of waiting
> > for read ack from peer, it is moved from rio->write_queue to 
> > rio->active_txs.
> > However the msg in rio->active_txs is not checked by pcs_rdma_next_timeout()
> > to return a correct timeout back to rpc, as a result the rpc timer is not
> > started. When the peer somehow becomes unresponsive, the msg at 
> > rio->active_txs
> > can be stuck at waiting for read ack stage forever, because it can't be 
> > killed
> > by the calendar timer since it's now under network I/O. As a result, the rpc
> > can hang forever without detecting the stuck msg at underlying rdma io.
> >
> > Apparently pcs_rdma_next_timeout should return the next timeout based on
> > first msg in rio->active_txs.
> >
> > Fixes: #VSTOR-105982
> > https://virtuozzo.atlassian.net/browse/VSTOR-105982
> >
> > Signed-off-by: Liu Kui <kui....@virtuozzo.com>
> > ---
> >  fs/fuse/kio/pcs/pcs_rdma_io.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c
> > index 2755b13fb8a5..6fa38338ad0c 100644
> > --- a/fs/fuse/kio/pcs/pcs_rdma_io.c
> > +++ b/fs/fuse/kio/pcs/pcs_rdma_io.c
> > @@ -1668,14 +1668,21 @@ static unsigned long pcs_rdma_next_timeout(struct 
> > pcs_netio *netio)
> >         struct pcs_rdmaio *rio = rio_from_netio(netio);
> >         struct pcs_rpc *ep = netio->parent;
> >         struct pcs_msg *msg;
> > +       struct rio_tx *tx;
> >
> >         BUG_ON(!mutex_is_locked(&ep->mutex));
> >
> > -       if (list_empty(&rio->write_queue))
> > -               return 0;
> > +       if (!list_empty(&rio->active_txs)) {
> > +               tx = list_first_entry(&rio->active_txs, struct rio_tx, 
> > list);
> > +               return tx->msg->start_time + rio->send_timeout;
> > +       }
> >
> > -       msg = list_first_entry(&rio->write_queue, struct pcs_msg, list);
> > -       return msg->start_time + rio->send_timeout;
> > +       if (!list_empty(&rio->write_queue)) {
> > +               msg = list_first_entry(&rio->write_queue, struct pcs_msg, 
> > list);
> > +               return msg->start_time + rio->send_timeout;
> > +       }
> > +
> > +       return 0;
> >  }
> >
> >  static int pcs_rdma_sync_send(struct pcs_netio *netio, struct pcs_msg *msg)
> > --
> > 2.39.5 (Apple Git-154)
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to