We'll need your kernel config for the reproduction. Todd Fujinaka Software Application Engineer Networking Division (ND) Intel Corporation todd.fujin...@intel.com (503) 712-4565
-----Original Message----- From: Tony Battersby [mailto:to...@cybernetics.com] Sent: Wednesday, January 07, 2015 3:02 PM To: Fujinaka, Todd; e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] [PATCH] igb: fix Tx Unit Hang on 82576 with RSS > 1 Actually igb_cache_ring_register *did* change between igb-5.2.5.tar.gz and igb-5.2.9.2.tar.gz. My first patch did nothing more than revert that change to make igb_cache_ring_register the same as it was in igb-5.2.5.tar.gz (plus adding the fall through comment from the upstream kernel). I came up with the patch by manually bisecting the changes between the two driver versions to figure out what broke it. So the fix that you are against was not my idea at all. I was just reverting the change that caused breakage. To recap: igb-5.2.5.tar.gz does work, and does NOT have a "break" in between "case e1000_82576" and the following cases. igb-5.2.9.2.tar.gz and later do NOT work, and they DO have "break" in between "case e1000_82576" and the following cases. in-kernel igb does work, and does NOT have a "break" in between "case e1000_82576" and the following cases. The in-kernel igb also has the "/* Fall through */" comment in place of the break. That was added by upstream commit b26141d47a4a73f07853986bd6b5a9f4ee6b4fa1 "igb: Cleanups to fix missing break in switch statements". That is why I added the fall through comment in my patch - to match the upstream kernel. The OS is custom built from scratch rather than being a regular distribution. It is running a very minimalistic userspace with upstream kernel 3.18.1. To reproduce the problem: 1) boot the machine 2) modprobe igb InterruptThrottleRate=1,1 QueuePairs=1,1 IntMode=2,2 RSS=8,8 3) ifconfig eth0 192.168.136.21 netmask 255.255.255.0 mtu 1500 4) ping 192.168.136.55 5) immediately get "Tx Unit Hang" message, and no ping response 6) try again with my patch, and everything works fine I have attached files from the system with my patch when everything is working fine. Let me know if you want files from the system without my patch when it is not working. Motherboard: Supermicro X8DTH-6F with BIOS 2.1b CPU: Intel Xeon X5650 @ 2.67GHz Tony On 01/07/2015 05:02 PM, Fujinaka, Todd wrote: > First, please don't remove the older parts of the thread. We're using outlook > here and the threading is awful and sometimes it's hard for me to follow the > thread. > > Second, I'm not sure what you're doing. Your first email said: > > Known-good versions: > sourceforge igb-5.2.5.tar.gz and earlier > upstream in-kernel igb > > Known-bad versions: > sourceforge igb-5.2.9.2.tar.gz and later > > igb_cache_ring_register hasn't changed between igb-5.2.5.tar.gz and > igb-5.2.9.2.tar.gz. While I'm not doubting we changed some behavior, I have > to repeat what I said in my first reply, your fix is wrong. > > We'll try reproducing your issue locally so we'll need exact repro steps > starting from how you installed the OS to what you did to set up your system. > > Thanks. > > Todd Fujinaka > Software Application Engineer > Networking Division (ND) > Intel Corporation > todd.fujin...@intel.com > (503) 712-4565 > > -----Original Message----- > From: Tony Battersby [mailto:to...@cybernetics.com] > Sent: Wednesday, January 07, 2015 11:06 AM > To: Fujinaka, Todd; e1000-devel@lists.sourceforge.net > Subject: Re: [E1000-devel] [PATCH] igb: fix Tx Unit Hang on 82576 with > RSS > 1 > > I had igb_cache_ring_register() print out the relevant values and this is > what I get: > > rss_queues = 8 > vmdq_pools = 0 > num_rx_queues = 8 > num_tx_queues = 8 > > With those values, the unpatched igb_cache_ring_register() does nothing > because of the break. With my patch, it falls through to the default case. > Maybe the break should be contained within the if ((adapter->rss_queues > 1) > && adapter->vmdq_pools) { ... } code segment instead, and the 82576 non-VMDq > case should fall through? But I am testing only the non-VMDq case, so I > can't confirm if the VMDq case is broken or not. What do you think of this > patch instead: > > --- igb-5.2.15/src/igb_main.c.orig 2014-09-18 12:12:17.000000000 -0400 > +++ igb-5.2.15/src/igb_main.c 2015-01-07 14:02:08.000000000 -0500 > @@ -407,8 +407,9 @@ static void igb_cache_ring_register(stru > for (; i < adapter->rss_queues; i++) > adapter->rx_ring[i]->reg_idx = rbase_offset + > Q_IDX_82576(i); > + break; > } > - break; > + /* Fall through */ > case e1000_82575: > case e1000_82580: > case e1000_i350: > > > On 01/07/2015 01:10 PM, Fujinaka, Todd wrote: >> I think this is a setup issue. We don't use multiple TX queues per pool when >> VMDq is enabled and that's why there's a special case for the 82576. >> >> Your patch would invalidate the whole switch statement and set up all parts >> the same way. >> >> Todd Fujinaka >> Software Application Engineer >> Networking Division (ND) >> Intel Corporation >> todd.fujin...@intel.com >> (503) 712-4565 >> >> > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired