Re: vhost-user-blk reconnect issue

2024-04-10 Thread Yajun Wu


On 4/2/2024 4:44 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*



Hi,

I tested it today and there is indeed a problem in this scenario.
It seems that the first version of the patch is the best and can 
handle all scenarios.

With this patch, the previously merged patches are no longer useful.


I will revert this patch and submit a new fix. Do you have any comments?

Revert: 
https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/ 

New: 
https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/ 



Looks good to me.

Thanks,

Yajun



Thanks,
Li


2024年4月1日 16:43,Yajun Wu  写道:


On 4/1/2024 4:34 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but 
in the end this solution was not accepted and other solutions

were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 

lore.kernel.org 








I think this fix is valid.


This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 

lore.kernel.org 








My tests are with this fix, failed in the two scenarios I mentioned.



Thanks,
Li


2024年4月1日 10:08,Yajun Wu  写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend 
reconnect

failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
failures. Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is 
the cause, previous code doesn't have this check?




Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe 
vblk device, then reconnect backend after guest driver probe 
device. QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe 
vblk device, so vhost_user_blk_disconnect won't be called, 
s->connected is still true. Next vhost_user_blk_connect will 
simply return without doing anything.


2. modprobe -r virtio-blk inside VM, then disconnect backend, 
then reconnect backend, then modprobe virtio-blk. QEMU won't send 
messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, 
vhost->vdev also become NULL(in vhost_dev_stop), 
vhost_user_blk_disconnect won't be called. Again s->connected is 
still true, even chr connect is closed.


I think even vhost->vdev is NULL, vhost_user_blk_disconnect 
should be called when chr connect close?

Hope we can have a fix soon.


Thanks,
Yajun





Re: vhost-user-blk reconnect issue

2024-04-02 Thread Li Feng

Hi,

I tested it today and there is indeed a problem in this scenario.
It seems that the first version of the patch is the best and can handle all 
scenarios.
With this patch, the previously merged patches are no longer useful.


I will revert this patch and submit a new fix. Do you have any comments?

Revert: 
https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/
New: https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/

Thanks,
Li

> 2024年4月1日 16:43,Yajun Wu  写道:
> 
> 
> 
> On 4/1/2024 4:34 PM, Li Feng wrote:
>> 
>> External email: Use caution opening links or attachments   
>> 
>> Hi yajun,
>> 
>> I have submitted a patch to fix this problem a few months ago, but in the 
>> end this solution was not accepted and other solutions
>> were adopted to fix it.
>> 
>> https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/
>> 
> I think this fix is valid.
> 
>> This is the merged fix:
>> 
>> 
>> https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/My
>>  tests are with this fix, failed in the two scenarios I mentioned. 
> 
>> 
>> Thanks,
>> Li
>> 
>>> 2024年4月1日 10:08,Yajun Wu   写道:
>>> 
>>> 
>>> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
 External email: Use caution opening links or attachments
 
 
 Hi Yajun,
 
 On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:
> Hi experts,
> 
> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
> failure scenarios:
 Do you know if has it ever worked and so it's a regression, or have we
 always had this problem?
>>> 
>>> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: 
>>> generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash 
>>> is good.
>>> 
>>> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, 
>>> previous code doesn't have this check?
>>> 
 
 Thanks,
 Stefano
 
> 1. Disconnect vhost-user-blk backend before guest driver probe vblk 
> device, then reconnect backend after guest driver probe device. QEMU 
> won't send out any vhost messages to restore backend.
> This is because vhost->vdev is NULL before guest driver probe vblk 
> device, so vhost_user_blk_disconnect won't be called, s->connected is 
> still true. Next vhost_user_blk_connect will simply return without doing 
> anything.
> 
> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
> reconnect backend, then modprobe virtio-blk. QEMU won't send messages in 
> vhost_dev_init.
> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev 
> also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be 
> called. Again s->connected is still true, even chr connect is closed.
> 
> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be 
> called when chr connect close?
> Hope we can have a fix soon.
> 
> 
> Thanks,
> Yajun
> 
>> 



Re: vhost-user-blk reconnect issue

2024-04-01 Thread Yajun Wu


On 4/1/2024 4:34 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in 
the end this solution was not accepted and other solutions

were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 

lore.kernel.org 








I think this fix is valid.


This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 

lore.kernel.org 








My tests are with this fix, failed in the two scenarios I mentioned.



Thanks,
Li


2024年4月1日 10:08,Yajun Wu  写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
failures. Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
cause, previous code doesn't have this check?




Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe vblk 
device, then reconnect backend after guest driver probe device. 
QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk 
device, so vhost_user_blk_disconnect won't be called, s->connected 
is still true. Next vhost_user_blk_connect will simply return 
without doing anything.


2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
reconnect backend, then modprobe virtio-blk. QEMU won't send 
messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, 
vhost->vdev also become NULL(in vhost_dev_stop), 
vhost_user_blk_disconnect won't be called. Again s->connected is 
still true, even chr connect is closed.


I think even vhost->vdev is NULL, vhost_user_blk_disconnect should 
be called when chr connect close?

Hope we can have a fix soon.


Thanks,
Yajun



Re: vhost-user-blk reconnect issue

2024-04-01 Thread Li Feng
Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in the end 
this solution was not accepted and other solutions
were adopted to fix it.

https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/

This is the merged fix:


https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/

Thanks,
Li

> 2024年4月1日 10:08,Yajun Wu  写道:
> 
> 
> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
>> External email: Use caution opening links or attachments
>> 
>> 
>> Hi Yajun,
>> 
>> On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:
>>> Hi experts,
>>> 
>>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
>>> failure scenarios:
>> Do you know if has it ever worked and so it's a regression, or have we
>> always had this problem?
> 
> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: 
> generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash is 
> good.
> 
> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, 
> previous code doesn't have this check?
> 
>> 
>> Thanks,
>> Stefano
>> 
>>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
>>> then reconnect backend after guest driver probe device. QEMU won't send out 
>>> any vhost messages to restore backend.
>>> This is because vhost->vdev is NULL before guest driver probe vblk device, 
>>> so vhost_user_blk_disconnect won't be called, s->connected is still true. 
>>> Next vhost_user_blk_connect will simply return without doing anything.
>>> 
>>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
>>> reconnect backend, then modprobe virtio-blk. QEMU won't send messages in 
>>> vhost_dev_init.
>>> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev 
>>> also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be 
>>> called. Again s->connected is still true, even chr connect is closed.
>>> 
>>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be 
>>> called when chr connect close?
>>> Hope we can have a fix soon.
>>> 
>>> 
>>> Thanks,
>>> Yajun
>>> 



Re: vhost-user-blk reconnect issue

2024-04-01 Thread Michael S. Tsirkin
On Mon, Apr 01, 2024 at 10:08:10AM +0800, Yajun Wu wrote:
> 
> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hi Yajun,
> > 
> > On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:
> > > Hi experts,
> > > 
> > > With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
> > > failure scenarios:
> > Do you know if has it ever worked and so it's a regression, or have we
> > always had this problem?
> 
> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio:
> generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash
> is good.

CC Alex who wrote that commit.

> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause,
> previous code doesn't have this check?
> 
> > 
> > Thanks,
> > Stefano
> > 
> > > 1. Disconnect vhost-user-blk backend before guest driver probe vblk 
> > > device, then reconnect backend after guest driver probe device. QEMU 
> > > won't send out any vhost messages to restore backend.
> > > This is because vhost->vdev is NULL before guest driver probe vblk 
> > > device, so vhost_user_blk_disconnect won't be called, s->connected is 
> > > still true. Next vhost_user_blk_connect will simply return without doing 
> > > anything.
> > > 
> > > 2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
> > > reconnect backend, then modprobe virtio-blk. QEMU won't send messages in 
> > > vhost_dev_init.
> > > This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev 
> > > also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be 
> > > called. Again s->connected is still true, even chr connect is closed.
> > > 
> > > I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be 
> > > called when chr connect close?
> > > Hope we can have a fix soon.
> > > 
> > > 
> > > Thanks,
> > > Yajun
> > > 




Re: vhost-user-blk reconnect issue

2024-03-31 Thread Yajun Wu



On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. 
Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
cause, previous code doesn't have this check?




Thanks,
Stefano


1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
then reconnect backend after guest driver probe device. QEMU won't send out any 
vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so 
vhost_user_blk_disconnect won't be called, s->connected is still true. Next 
vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect 
backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also 
become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again 
s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called 
when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun





Re: vhost-user-blk reconnect issue

2024-03-27 Thread Stefano Garzarella

Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect 
failure scenarios:


Do you know if has it ever worked and so it's a regression, or have we 
always had this problem?


Thanks,
Stefano


1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
then reconnect backend after guest driver probe device. QEMU won't send out any 
vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so 
vhost_user_blk_disconnect won't be called, s->connected is still true. Next 
vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect 
backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also 
become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again 
s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called 
when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun