Hello! Can we make this to ready-kernel patch for 5.14.0-427.33.1.vz9.72.5? I promised customer in https://virtuozzo.atlassian.net/browse/ASUP-1425 to repair this issue without reboot.
On Thu, May 8, 2025 at 7:11 PM Alexey Kuznetsov <kuz...@virtuozzo.com> wrote: > > Ack > > On Thu, May 8, 2025 at 12:26 PM Liu Kui <kui....@virtuozzo.com> wrote: > > > > When a large msg is being sent out over rdma and in the stage of waiting > > for read ack from peer, it is moved from rio->write_queue to > > rio->active_txs. > > However the msg in rio->active_txs is not checked by pcs_rdma_next_timeout() > > to return a correct timeout back to rpc, as a result the rpc timer is not > > started. When the peer somehow becomes unresponsive, the msg at > > rio->active_txs > > can be stuck at waiting for read ack stage forever, because it can't be > > killed > > by the calendar timer since it's now under network I/O. As a result, the rpc > > can hang forever without detecting the stuck msg at underlying rdma io. > > > > Apparently pcs_rdma_next_timeout should return the next timeout based on > > first msg in rio->active_txs. > > > > Fixes: #VSTOR-105982 > > https://virtuozzo.atlassian.net/browse/VSTOR-105982 > > > > Signed-off-by: Liu Kui <kui....@virtuozzo.com> > > --- > > fs/fuse/kio/pcs/pcs_rdma_io.c | 15 +++++++++++---- > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c > > index 2755b13fb8a5..6fa38338ad0c 100644 > > --- a/fs/fuse/kio/pcs/pcs_rdma_io.c > > +++ b/fs/fuse/kio/pcs/pcs_rdma_io.c > > @@ -1668,14 +1668,21 @@ static unsigned long pcs_rdma_next_timeout(struct > > pcs_netio *netio) > > struct pcs_rdmaio *rio = rio_from_netio(netio); > > struct pcs_rpc *ep = netio->parent; > > struct pcs_msg *msg; > > + struct rio_tx *tx; > > > > BUG_ON(!mutex_is_locked(&ep->mutex)); > > > > - if (list_empty(&rio->write_queue)) > > - return 0; > > + if (!list_empty(&rio->active_txs)) { > > + tx = list_first_entry(&rio->active_txs, struct rio_tx, > > list); > > + return tx->msg->start_time + rio->send_timeout; > > + } > > > > - msg = list_first_entry(&rio->write_queue, struct pcs_msg, list); > > - return msg->start_time + rio->send_timeout; > > + if (!list_empty(&rio->write_queue)) { > > + msg = list_first_entry(&rio->write_queue, struct pcs_msg, > > list); > > + return msg->start_time + rio->send_timeout; > > + } > > + > > + return 0; > > } > > > > static int pcs_rdma_sync_send(struct pcs_netio *netio, struct pcs_msg *msg) > > -- > > 2.39.5 (Apple Git-154) _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel