Hi Roy, I am sure we can figure out what is going on, thanks for the
report.

Can you run one test for me? Please try without the
InterruptThrottleRate driver parameter, but with LRO enabled.

Since you are here at the same campus as we are I hope I can maybe just
get direct access to your machines.

On Wed, 2009-11-11 at 15:24 -0800, Larsen, Roy K wrote:
> I believe there is a problem with the software LRO in the ixgbe driver.  With 
> LRO enabled, my cluster application hangs where two processes have data to 
> send to each other as indicated by looking at the send queue with netstat(8) 
> but it is not making progress even though the receive queues are empty.  If I 
> build the driver without LRO (make CFLAGS_EXTRA="-DIXGBE_NO_LRO" install), 
> this issue goes away. These are compute nodes that do not do routing or IP 
> forwarding.  The hang is easily reproduced.  The particulars follow.
> 
> Roy Larsen
> Intel Corp.
> roy.k.lar...@intel.com<mailto:roy.k.lar...@intel.com>
> JF5-3-J4
> 
> ------------------
> 
> Red Hat EL5.3 (2.6.18-128.el5 kernel)
> Dual socket Nehalem 2.9GHz nodes (8 cores) with 12GB of memory, 
> hyper-threading disabled
> Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network 
> Connection (rev 01)
> Fujitsu xg700 switch
> 8 nodes (64 cores)
> Intel MPI 4.0.0.014
> 
> [r...@cstnh-1 library]# ethtool -i eth2
> driver: ixgbe
> version: 2.0.44.14-NAPI
> firmware-version: 1.8-0
> bus-info: 0000:02:00.0
> 
> ixgbe driver loaded with following options:
> modprobe ixgbe InterruptThrottleRate=0,0
> 
> netstat -t on node "nh1-eth2"
> 
> Proto Recv-Q Send-Q Local Address               Foreign Address             
> State
> tcp        0   5224 nh1-eth2:55716              nh2-eth2:44115              
> ESTABLISHED
> 
> netstat -t on node "nh2-eth2"
> 
> Proto Recv-Q Send-Q Local Address               Foreign Address             
> State
> tcp        0 331648 nh2-eth2:44115              nh1-eth2:55716              
> ESTABLISHED
> 
> The tcpdump(8) trace shows the connection is not making progress
> 
> [r...@cstnh-1 library]# tcpdump -i eth2 -v host nh2-eth2 and host nh1-eth2 
> and port 55716 and port 44115
> tcpdump-tnic: listening on eth2, link-type EN10MB (Ethernet), capture size 96 
> bytes
> 17:17:17.092120 IP (tos 0x0, ttl 64, id 95, offset 0, flags [DF], proto TCP 
> (6), length 1500)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], ack 162998588, win 382, 
> length 1460
> 17:17:17.092227 IP (tos 0x0, ttl 64, id 56737, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:17:17.348305 IP (tos 0x0, ttl 64, id 56738, offset 0, flags [DF], proto 
> TCP (6), length 1500)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
> 1460
> 17:17:17.348326 IP (tos 0x0, ttl 64, id 96, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
> 1, win 382, length 0
> 17:17:17.348331 IP (tos 0x0, ttl 64, id 56739, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:18:08.548706 IP (tos 0x0, ttl 64, id 97, offset 0, flags [DF], proto TCP 
> (6), length 1500)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
> 17:18:08.548711 IP (tos 0x0, ttl 64, id 56740, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:18:09.060306 IP (tos 0x0, ttl 64, id 56741, offset 0, flags [DF], proto 
> TCP (6), length 1500)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
> 1460
> 17:18:09.060327 IP (tos 0x0, ttl 64, id 98, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
> 1, win 382, length 0
> 17:18:09.060332 IP (tos 0x0, ttl 64, id 56742, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:19:51.461901 IP (tos 0x0, ttl 64, id 99, offset 0, flags [DF], proto TCP 
> (6), length 1500)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
> 17:19:51.461909 IP (tos 0x0, ttl 64, id 56743, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:19:52.484306 IP (tos 0x0, ttl 64, id 56744, offset 0, flags [DF], proto 
> TCP (6), length 1500)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
> 1460
> 17:19:52.484328 IP (tos 0x0, ttl 64, id 100, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
> 1, win 382, length 0
> 17:19:52.484333 IP (tos 0x0, ttl 64, id 56745, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:21:51.463283 IP (tos 0x0, ttl 64, id 101, offset 0, flags [DF], proto TCP 
> (6), length 1500)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
> 17:21:51.463288 IP (tos 0x0, ttl 64, id 56746, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
> 17:21:52.484305 IP (tos 0x0, ttl 64, id 56747, offset 0, flags [DF], proto 
> TCP (6), length 1500)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
> 1460
> 17:21:52.484327 IP (tos 0x0, ttl 64, id 102, offset 0, flags [DF], proto TCP 
> (6), length 40)
>     nh2-eth2.44115 > nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
> 1, win 382, length 0
> 17:21:52.484332 IP (tos 0x0, ttl 64, id 56748, offset 0, flags [DF], proto 
> TCP (6), length 40)
>     nh1-eth2.55716 > nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
> 39426, win 382, length 0
-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel

Reply via email to