On 4/2/2024 4:44 PM, Li Feng wrote:
*External email: Use caution opening links or attachments*



Hi,

I tested it today and there is indeed a problem in this scenario.
It seems that the first version of the patch is the best and can handle all scenarios.
With this patch, the previously merged patches are no longer useful.


I will revert this patch and submit a new fix. Do you have any comments?

Revert: https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/ <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/> New: https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/ <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

Looks good to me.

Thanks,

Yajun


Thanks,
Li

2024年4月1日 16:43,Yajun Wu <yaj...@nvidia.com> 写道:


On 4/1/2024 4:34 PM, Li Feng wrote:
*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions
were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> lore.kernel.org <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
        
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

I think this fix is valid.

This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/> lore.kernel.org <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
        
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>

<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>

My tests are with this fix, failed in the two scenarios I mentioned.


Thanks,
Li

2024年4月1日 10:08,Yajun Wu <yaj...@nvidia.com> 写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:
Do you know if has it ever worked and so it's a regression, or have we
always had this problem?

I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash is good.

I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check?


Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun


Reply via email to