RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

KY Srinivasan Tue, 22 Mar 2016 07:18:31 -0700


> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Tuesday, March 22, 2016 7:01 AM
> To: KY Srinivasan <[email protected]>
> Cc: [email protected]; [email protected]; Haiyang
> Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>;
> Radim Krcmar <[email protected]>; Cathy Avery <[email protected]>
> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> 
> KY Srinivasan <[email protected]> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> Sent: Monday, March 21, 2016 12:52 AM
> >> To: KY Srinivasan <[email protected]>
> >> Cc: [email protected]; [email protected]; Haiyang
> >> Zhang <[email protected]>; Alex Ng (LIS)
> <[email protected]>;
> >> Radim Krcmar <[email protected]>; Cathy Avery
> <[email protected]>
> >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >>
> >> KY Srinivasan <[email protected]> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> >> Sent: Friday, March 18, 2016 5:33 AM
> >> >> To: [email protected]
> >> >> Cc: [email protected]; KY Srinivasan <[email protected]>;
> >> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
> >> >> <[email protected]>; Radim Krcmar <[email protected]>;
> Cathy
> >> >> Avery <[email protected]>
> >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >> >>
> >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
> >> always
> >> >> delivered to CPU0 regardless of what CPU we're sending
> >> >> CHANNELMSG_UNLOAD
> >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in
> case
> >> >> we're crashing on some other CPU and CPU0 is still alive and
> operational
> >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there
> completing
> >> >> vmbus_connection.unload_event, our wait on the current CPU will
> never
> >> >> end.
> >> >
> >> > What was the host you were testing on?
> >> >
> >>
> >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
> >> by forcing crash on a secondary CPU, e.g.:
> >
> > Prior to 2012R2, all messages would be delivered on CPU0 and this includes
> CHANNELMSG_UNLOAD_RESPONSE.
> > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012.
> From 2012 R2 on, all vmbus
> > messages (responses) will be delivered on  the CPU that we initially set up 
> > -
> look at the code in
> > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to
> CHANNELMSG_UNLOAD_RESPONSE
> > will be delivered on the CPU where we initiate the contact with the
> > host - CHANNELMSG_INITIATE_CONTACT message.
> 
> Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4.
> On
> WS2012R2 what you're saying is true and all messages including
> CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for
> initial
> contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a
> special
> case and it is always delivered to CPU0, no matter which CPU we used for
> initial contact. This can be a host bug. You can use the attached patch
> to see the issue.


This looks like a host bug and I will try to get is addressed before ws2016
ships.
> 
> For now I can suggest we check message pages for all CPUs from
> vmbus_wait_for_unload(). We can race with other CPUs again but we don't
> care as we're checking for completion_done() in the loop as well. I'll
> try this approach.
Thank you.

K. Y

> 
> --
>   Vitaly

_______________________________________________
devel mailing list
[email protected]
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

Reply via email to