The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at g...@bitbucket.org:openvz/vzkernel.git after rh9-5.14.0-427.44.1.vz9.80.33 ------> commit bac90074d56fb3242cfd4800dee7ac3d1192219a Author: Liu Kui <kui....@virtuozzo.com> Date: Thu May 8 12:17:24 2025 +0800
fs/fuse kio: fix hang rpc over rdma io When a large msg is being sent out over rdma and in the stage of waiting for read ack from peer, it is moved from rio->write_queue to rio->active_txs. However the msg in rio->active_txs is not checked by pcs_rdma_next_timeout() to return a correct timeout back to rpc, as a result the rpc timer is not started. When the peer somehow becomes unresponsive, the msg at rio->active_txs can be stuck at waiting for read ack stage forever, because it can't be killed by the calendar timer since it's now under network I/O. As a result, the rpc can hang forever without detecting the stuck msg at underlying rdma io. Apparently pcs_rdma_next_timeout should return the next timeout based on first msg in rio->active_txs. Fixes: 8a3ab7d6c2963 ("fs/fuse kio: implement support RDMA transport") https://virtuozzo.atlassian.net/browse/VSTOR-105982 Signed-off-by: Liu Kui <kui....@virtuozzo.com> Acked-by: Alexey Kuznetsov <kuz...@virtuozzo.com> Feature: vStorage --- fs/fuse/kio/pcs/pcs_rdma_io.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c index 1d9e648d2636e..1c464e0e60f3e 100644 --- a/fs/fuse/kio/pcs/pcs_rdma_io.c +++ b/fs/fuse/kio/pcs/pcs_rdma_io.c @@ -1683,14 +1683,21 @@ static unsigned long pcs_rdma_next_timeout(struct pcs_netio *netio) struct pcs_rdmaio *rio = rio_from_netio(netio); struct pcs_rpc *ep = netio->parent; struct pcs_msg *msg; + struct rio_tx *tx; BUG_ON(!mutex_is_locked(&ep->mutex)); - if (list_empty(&rio->write_queue)) - return 0; + if (!list_empty(&rio->active_txs)) { + tx = list_first_entry(&rio->active_txs, struct rio_tx, list); + return tx->msg->start_time + rio->send_timeout; + } - msg = list_first_entry(&rio->write_queue, struct pcs_msg, list); - return msg->start_time + rio->send_timeout; + if (!list_empty(&rio->write_queue)) { + msg = list_first_entry(&rio->write_queue, struct pcs_msg, list); + return msg->start_time + rio->send_timeout; + } + + return 0; } static int pcs_rdma_sync_send(struct pcs_netio *netio, struct pcs_msg *msg) _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel