On 3/30/21 3:20 AM, Hu, Jiayu wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin <maxime.coque...@redhat.com> >> Sent: Monday, March 29, 2021 11:19 PM >> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org >> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan >> <yinan.w...@intel.com>; Jiang, Cheng1 <cheng1.ji...@intel.com>; Pai G, >> Sunil <sunil.pa...@intel.com> >> Subject: Re: [PATCH 3/4] vhost: avoid deadlock on async register >> >> >> >> On 3/17/21 1:56 PM, Jiayu Hu wrote: >>> Users register async copy device when vhost queue is enabled. >>> However, if VHOST_USER_F_PROTOCOL_FEATURES is not supported, >>> a deadlock occurs inside rte_vhost_async_channel_register(), >>> as vhost_user_msg_handler() already takes vq->access_lock >>> before processing VHOST_USER_SET_VRING_KICK message. >>> >>> This patch removes calling vring_state_changed() in >>> vhost_user_set_vring_kick() to avoid deadlock on async register. >>> >>> Signed-off-by: Jiayu Hu <jiayu...@intel.com> >>> --- >>> lib/librte_vhost/vhost_user.c | 3 --- >>> 1 file changed, 3 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c >>> index 399675c..a319c1c 100644 >>> --- a/lib/librte_vhost/vhost_user.c >>> +++ b/lib/librte_vhost/vhost_user.c >>> @@ -1919,9 +1919,6 @@ vhost_user_set_vring_kick(struct virtio_net >> **pdev, struct VhostUserMsg *msg, >>> */ >>> if (!(dev->features & (1ULL << >> VHOST_USER_F_PROTOCOL_FEATURES))) { >>> vq->enabled = 1; >>> - if (dev->notify_ops->vring_state_changed) >>> - dev->notify_ops->vring_state_changed( >>> - dev->vid, file.index, 1); >> >> That looks very wrong, as: >> 1. The apps want to receive this notification. It looks like breaking >> existing apps in order to support the experimental async datapath. E.g. >> OVS needs it to start polling the queues when protocol features is not >> negotiated. > > IMHO, if protocol feature is not negotiated, vring_state_chaned will also > be called in vhost_user_msg_handler. In the case you mentioned, > vq->enabled is set to true in set_vring_kick, and in vhost_user_msg_handler, > "cur_ready != (vq && vq->ready)" is true, as vq->ready is false when init. So > vhost_user_msg_handler will call vhost_user_notify_queue_state, which > calls set_vring_kick inside. OK, I agree, we can drop this one. But it is not enough as vhost_user_notify_queue_state() is called at several place with the lock taken. > In addition, calling vring_state_changed in set_vring_kick is protected by > lock, > but it's not in in vhost_user_msg_handler. It looks confusing to me. Is there > any special reason for this design? I think we need the lock help every time the callback is called, to avoid the case an application calls a Vhost API that would modify the vq struct. We could get undefined behavior if it happened. > >> >> 2. The fix in your case seems to indicate that your app's >> vring_state_changed callback called rte_vhost_async_channel_register. >> And your fix consists in no more calling the callback, and so no more >> calling rte_vhost_async_channel_register? > > rte_vhost_async_channel_register is recommended to call in > vring_state_changed, and vring_state_changed will be called > by vhost_user_msg_handler. You might want to schedule a thread to call channel registration. Maybe using rte_set_alarm? Regards, Maxime > > Thanks, > Jiayu >> >>> } >>> >>> if (vq->ready) { >>> >