> -----Original Message----- > From: Vitaly Kuznetsov [mailto:[email protected]] > Sent: Tuesday, March 22, 2016 7:01 AM > To: KY Srinivasan <[email protected]> > Cc: [email protected]; [email protected]; Haiyang > Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>; > Radim Krcmar <[email protected]>; Cathy Avery <[email protected]> > Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > KY Srinivasan <[email protected]> writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:[email protected]] > >> Sent: Monday, March 21, 2016 12:52 AM > >> To: KY Srinivasan <[email protected]> > >> Cc: [email protected]; [email protected]; Haiyang > >> Zhang <[email protected]>; Alex Ng (LIS) > <[email protected]>; > >> Radim Krcmar <[email protected]>; Cathy Avery > <[email protected]> > >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> > >> KY Srinivasan <[email protected]> writes: > >> > >> >> -----Original Message----- > >> >> From: Vitaly Kuznetsov [mailto:[email protected]] > >> >> Sent: Friday, March 18, 2016 5:33 AM > >> >> To: [email protected] > >> >> Cc: [email protected]; KY Srinivasan <[email protected]>; > >> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS) > >> >> <[email protected]>; Radim Krcmar <[email protected]>; > Cathy > >> >> Avery <[email protected]> > >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> >> > >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is > >> always > >> >> delivered to CPU0 regardless of what CPU we're sending > >> >> CHANNELMSG_UNLOAD > >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in > case > >> >> we're crashing on some other CPU and CPU0 is still alive and > operational > >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there > completing > >> >> vmbus_connection.unload_event, our wait on the current CPU will > never > >> >> end. > >> > > >> > What was the host you were testing on? > >> > > >> > >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible > >> by forcing crash on a secondary CPU, e.g.: > > > > Prior to 2012R2, all messages would be delivered on CPU0 and this includes > CHANNELMSG_UNLOAD_RESPONSE. > > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. > From 2012 R2 on, all vmbus > > messages (responses) will be delivered on the CPU that we initially set up > > - > look at the code in > > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to > CHANNELMSG_UNLOAD_RESPONSE > > will be delivered on the CPU where we initiate the contact with the > > host - CHANNELMSG_INITIATE_CONTACT message. > > Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. > On > WS2012R2 what you're saying is true and all messages including > CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for > initial > contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a > special > case and it is always delivered to CPU0, no matter which CPU we used for > initial contact. This can be a host bug. You can use the attached patch > to see the issue.
This looks like a host bug and I will try to get is addressed before ws2016 ships. > > For now I can suggest we check message pages for all CPUs from > vmbus_wait_for_unload(). We can race with other CPUs again but we don't > care as we're checking for completion_done() in the loop as well. I'll > try this approach. Thank you. K. Y > > -- > Vitaly _______________________________________________ devel mailing list [email protected] http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
