James, I'm sorry to hear you're having problems with this and I really appreciate the extensive and detailed investigation you've done on the matter.
I'll do some further investigation on my side and we will review and test your patch. Given that no one here at Intel has seen the problem it might be that the best we can do is incorporate the patch and then make sure that it doesn't break anything else. If that is the case and the patch fixes your issue then I can't see any reason for us to not accept the patch. Thanks, - Greg > -----Original Message----- > From: James Bulpin [mailto:[email protected]] > Sent: Wednesday, November 02, 2011 11:31 AM > To: [email protected] > Subject: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF > link staying down > > Hello, > > Summary: > 1. An observation/theory of a race condition in VF/PF mailbox protocol > leading to VF showing link down > 2. Belief that the race is exploited due to the VF not waiting on PF > replies for some messages > 3. Patch for the above > > I've been trying to get i350 SR-IOV VFs working in my Xen environment > (XenServer 6.0 [Xen 4.1], igb 3.2.10, HVM CentOS 5.x guest with igbvf > 1.1.5, Dell T3500). We've had a lot of experience and success with SR-IOV > with both the 82599 and 82576 ET2; this work was to extend our support to > include the i350. The problem I observed was that around 9 times out of 10 > the VF link was showing as down, both after the initial module load and > after a "ifconfig up". On the occasions the link did come up the VF worked > perfectly. Adding a few printks showed the link was being declared down > due to the VF/PF mailbox timeout having been hit (setting mbx->timeout to > zero). Further instrumentation suggests that the timeout occurred when the > VF was waiting for an ACK from the PF that never came, due to the original > VF to PF message not being received by the PF. Based on limited logging > the following sequence is what I believe happened: > > 1. VF sends message (e.g. 0x00010003) to PF (get VFU lock; write buffer; > release VFU lock/signal REQ), VF waits for ACK > 2. PF igb_msg_task handler checks the VFICR and finds REQ set > 3. PF igb_msg_task handler calls igb_rcv_msg_from_vf which reads message > and ACKs (get PFU lock; read buffer; release PFU lock/signal ACK) > 4. PF deals with message, e.g. calling igb_set_vf_multicasts > 5. VF receives ACK > 6. VF moves on to send next message (e.g. 0x00000006) to PF (get VFU > lock; write buffer; release VFU lock/signal REQ), VF waits for ACK > 7. PF igb_rcv_msg_from_vf sends reply message (orig msg | > E1000_VT_MSGTYPE_ACK|E1000_VT_MSGTYPE_CTS) from PF to VF (get PFU lock; > clear REQ/ACK bits in VFICR ***clearing the REQ flag just set by VF***; > write buffer ***overwriting VF's pending message***; release PFU > lock/signal STS) > 8. PF igb_msg_task handler runs but REQ flag is zero so message not > handled > 9. VF times out waiting for ACK to the second message > > >From inspecting the code in both drivers my understanding is that the > VFU/PFU locking mechanism is only being used to protect the message buffer > while it is being read or written, it is not protecting against the buffer > being re-used by either driver before the existing message has been > handled (the lock is released when setting REQ/STS, not on receiving the > ACK as the protocol description in the 82576 datasheet suggests). Adding a > 5000usec sleep after each message send from the VF makes the original > link-down failure go away giving some confidence to the race condition > theory. > > I believe that the race should not be a problem if the VF and PF are > exchanging messages in the expected order however in the above case the VF > sent a E1000_VF_SET_MULTICAST message but did not wait for the reply > message from the PF. Reviewing the driver code shows that of the five > messages the PF igb_rcv_msg_from_vf would reply to the VF driver does not > wait for replies for three (E1000_VF_SET_MULTICAST, E1000_VF_SET_LPE and > E1000_VF_SET_VLAN). Patching the VF driver (see below) to perform dummy > reads after sending each of these three messages makes the original link- > down failure go away. > > Have I misunderstood the locking strategy for the mailbox? As far as I can > see nothing has changed in the newer igb and ibgvf drivers that would > explain why I'm seeing the VF link failure on the i350 but not on the > other NICs (I don't see this with the older 82576 in the same system with > the same drivers). I can only assume it's just very bad luck with timing > in this particular system and configuration. > > Regards, > James Bulpin > > Read (and ignore) replies to VF to PF messages currently unhandled > > Signed-off-by: James Bulpin <[email protected]> > > diff -rup igbvf-1.1.5.pristine/src/e1000_vf.c igbvf- > 1.1.5.mboxreply/src/e1000_vf.c > --- igbvf-1.1.5.pristine/src/e1000_vf.c 2011-08-16 18:38:11.000000000 > +0100 > +++ igbvf-1.1.5.mboxreply/src/e1000_vf.c 2011-11-02 > 12:54:13.892369000 +0000 > @@ -269,6 +269,7 @@ void e1000_update_mc_addr_list_vf(struct > u16 *hash_list = (u16 *)&msgbuf[1]; > u32 hash_value; > u32 i; > + s32 ret_val; > > DEBUGFUNC("e1000_update_mc_addr_list_vf"); > > @@ -298,7 +299,10 @@ void e1000_update_mc_addr_list_vf(struct > mc_addr_list += ETH_ADDR_LEN; > } > > - mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0); > + ret_val = mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, > 0); > + if (!ret_val) > + mbx->ops.read_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0); > + > } > > /** > @@ -311,6 +315,7 @@ void e1000_vfta_set_vf(struct e1000_hw * > { > struct e1000_mbx_info *mbx = &hw->mbx; > u32 msgbuf[2]; > + s32 ret_val; > > msgbuf[0] = E1000_VF_SET_VLAN; > msgbuf[1] = vid; > @@ -318,7 +323,9 @@ void e1000_vfta_set_vf(struct e1000_hw * > if (set) > msgbuf[0] |= E1000_VF_SET_VLAN_ADD; > > - mbx->ops.write_posted(hw, msgbuf, 2, 0); > + ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0); > + if (!ret_val) > + mbx->ops.read_posted(hw, msgbuf, 2, 0); > } > > /** e1000_rlpml_set_vf - Set the maximum receive packet length > @@ -329,11 +336,14 @@ void e1000_rlpml_set_vf(struct e1000_hw > { > struct e1000_mbx_info *mbx = &hw->mbx; > u32 msgbuf[2]; > + s32 ret_val; > > msgbuf[0] = E1000_VF_SET_LPE; > msgbuf[1] = max_size; > > - mbx->ops.write_posted(hw, msgbuf, 2, 0); > + ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0); > + if (!ret_val) > + mbx->ops.read_posted(hw, msgbuf, 2, 0); > } > > /** > > -------------------------------------------------------------------------- > ---- > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > E1000-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
