IIRC, it was easy to reproduce by cranking the rebalance freq up (1s or even faster) and also introducing a delay of a few milliseconds in that bond_alb.c:tlb_clear_slave() routine between where we drop the lock and call tlb_init_slave()
--Michael O'Donnell -- Stratus Technologies, Maynard, MA USA > -----Original Message----- > From: Jay Vosburgh [mailto:[EMAIL PROTECTED] > Sent: Monday, January 09, 2006 3:14 PM > To: [EMAIL PROTECTED]; [email protected] > Cc: ODonnell, Michael > Subject: [PATCH netdev-2.6] bonding: UPDATED hash-table > corruption in bond_alb.c > > > I believe I see the race Michael refers to (tlb_choose_channel > may set head, which tlb_init_slave clears), although I was not able to > reproduce it. I have updated his patch for the current netdev-2.6.git > tree and added a version update. His original comment follows: > > Our systems have been crashing during testing of PCI HotPlug > support in the various networking components. We've faulted in > the bonding driver due to a bug in bond_alb.c:tlb_clear_slave() > > In that routine, the last modification to the TLB hash table is > made without protection of the lock, allowing a race that can lead > tlb_choose_channel() to select an invalid table element. > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] > > > Signed-off-by: Michael O'Donnell <Michael.ODonnell at stratus dot com> > Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]> > > --- netdev-2.6.git-upstream/drivers/net/bonding/bond_alb.c > 2006/01/07 00:26:11 1.1 > +++ netdev-2.6.git-upstream/drivers/net/bonding/bond_alb.c > 2006/01/09 19:55:12 > @@ -169,9 +169,9 @@ > index = next_index; > } > > - _unlock_tx_hashtbl(bond); > - > tlb_init_slave(slave); > + > + _unlock_tx_hashtbl(bond); > } > > /* Must be called before starting the monitor timer */ > --- netdev-2.6.git-upstream/drivers/net/bonding/bonding.h > 2006/01/07 00:26:11 1.1 > +++ netdev-2.6.git-upstream/drivers/net/bonding/bonding.h > 2006/01/09 19:55:42 > @@ -22,8 +22,8 @@ > #include "bond_3ad.h" > #include "bond_alb.h" > > -#define DRV_VERSION "3.0.0" > -#define DRV_RELDATE "November 8, 2005" > +#define DRV_VERSION "3.0.1" > +#define DRV_RELDATE "January 9, 2006" > #define DRV_NAME "bonding" > #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" > > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
