Re: [E1000-devel] TCP sockets stalling - help! (long)

Ilpo Järvinen Wed, 25 Nov 2009 04:36:05 -0800

On Wed, 25 Nov 2009, Asdo wrote:

> Hi netdev and e1000 people,
> 
> I have a weird problem here.
> 
> We have a Linux "Storage" server acting as a Samba, SCP and SFTP fileserver.
> All clients are Windows: XP and Vista, 32 and 64 bits. The building has
> gigabit network. We transfer up to many TB at a time.
> 
> I have tried from many clients to the same Server and there is always the same
> problem: after hours that a transfer is running the TCP socket suddenly
> stalls: 0 bytes/sec. It doesn't slow down, it stalls suddenly. It never
> disconnects though.
> 
> From that moment onward, the TCP socket stays in a stalling condition forever
> (I saw myself at least 10 hours) and both sides see it Established forever.
> This is in case of no other networking activity on the same network card.
> In case of other networking activity someone told me that he did see the
> transfer restart (=continue) by itself (it was an SFTP that time). I think I
> saw this phenomenon myself once during the tcpdump at the bottom of this
> email.
> 
> This problem happens on every protocol (Samba, SCP, SFTP) and apparently any
> client (for sure I tested on at least 1 Vista 64bit and 1 XP 32bit).
> 
> Also happens in the middle of transferring one file, so not necessarily at
> file change. The files are being pushed from the clients to the linux Server.
> 
> The gigabit switch between the machines was changed 2 times (the brands also
> were different) without results.
> 
> I was able to obtain apparently reliable nonstop transfer if I performed 6
> simultaneous transfers with Samba from the same client to the same server.
> OTOH with just 1 - 3 simultaneous transfers only all of them will eventually
> stall. Note that with 2 transfers the speed of each one is already smaller
> than the speed of a single transfer (approx 40+40MB/sec vs 60MB/sec) however
> with 6 transfers it's obviously much smaller (approx 6x12MB/sec). The bug
> looks like some kind of a race condition to me.
> 
> On Samba another strange fact is that if there are multiple transfers (i.e. 2)
> from the same client to the same server (please note that in this case with
> Samba all transfers pass through the same TCP socket) the two transfers stall
> *independently* one to the other. First after a random time you see one
> stalling, and eventually after another random time also the other one stalls.
> This tells that the socket actually kinda works... I can even stop one stalled
> transfer from the client (the stop operation *works* and gets a reply from the
> Samba server: I see messages sniffing the Ethernet), and start it again and it
> would run, and the TCP socket is always the same when I restart the transfer.
> 
> I obtained apparently reliable transfers with Samba if I introduced rate
> limiting on our SMC 8024L2 gigabit switch . Even if the rate limiting was set
> so high (930Mbps) that the transfer speed was actually not influenced, the
> Samba transfer (1 single transfer, not multiple) apparently worked reliably.
> Maybe this helps with the race condition by separating packets better. The
> SFTP protocol hanged anyway though, even on rate limiting, however consider
> that SFTP from windows goes very slow at 15Mbps due to a problem of encryption
> performance of WinSCP, so that one is really way lower than the rate limit and
> its packets might not get separated.
> 
> The server is a linux Ubuntu 9.10 with kernel ubuntu 2.6.31-14-server . The
> ethernets are two integrated Intels (I tested both ethernet ports):
> Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
> (Copper) (rev 01)
> the driver is kernel's e1000
> 
> This is the stack trace of server processes (Server side) during an SFTP
> stall:
> 
> root       327  [CUT] \_ sshd: myuser [priv]
> 1011       380  [CUT]     \_ sshd: myuser
> 1011       382  [CUT]         \_ /usr/lib/openssh/sftp-server
> 
> r...@mystorage:/proc# cat /proc/327/stack   (this is sshd)
> [<ffffffff814b428a>] unix_stream_data_wait+0xaa/0x110
> [<ffffffff814b50cd>] unix_stream_recvmsg+0x36d/0x570
> [<ffffffff81426399>] sock_aio_read+0x149/0x150
> [<ffffffff8111e232>] do_sync_read+0xf2/0x130
> [<ffffffff8111e8e1>] vfs_read+0x181/0x1a0
> [<ffffffff8111edec>] sys_read+0x4c/0x80
> [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> r...@mystorage:/proc# cat /proc/380/stack   (this is the second sshd)
> [<ffffffff8112e644>] poll_schedule_timeout+0x34/0x50
> [<ffffffff8112ef4f>] do_select+0x58f/0x6b0
> [<ffffffff8112f8b5>] core_sys_select+0x185/0x2b0
> [<ffffffff8112fc32>] sys_select+0x42/0x110
> [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> r...@mystorage:/proc# cat /proc/382/stack (this is sftp-server)
> [<ffffffff8112e644>] poll_schedule_timeout+0x34/0x50
> [<ffffffff8112ef4f>] do_select+0x58f/0x6b0
> [<ffffffff8112f8b5>] core_sys_select+0x185/0x2b0
> [<ffffffff8112fc32>] sys_select+0x42/0x110
> [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> so everything seems normal, it just seems the socket is waiting for data...
> 
> I am really puzzled because... I don't know how the Samba protocol works, but
> for SFTP which I believe is a simple protocol, I believe the client is just
> pushing raw bytes onto a simple TCP socket so it should keep doing that while
> it receives Acks. OTOH if it does not receive Acks the socket would
> disconnect. However in my case during a stall the socket stays Established (I
> checked at BOTH sides!) but the client stops pushing bytes to the server.
> (That's for a stall in the middle of a file on SFTP protocol, which *does*
> happen) How on earth can that be??  (update: see tcpdump below, it might
> explain)
> 
> I am also thinking at a hardware problem. The server is running on a Tyan
> mainboard S5396WA2NRF and we had another problem with this computer (mobo?),
> apparently hardware related, with 3ware disk controllers, same problem as this
> guy also running another Tyan:
> http://www.webhostingtalk.com/archive/index.php/t-449670.html
> which I thought might be reconducible to a lost interrupt. Even on a lost
> interrupt I can't understand how to solve the Ack puzzle... this is not a TOE
> network card!
> 
> This is what I did try up to now (and didn't help):
>   Changed some Samba parameters trying various values (clearly not useful for
> SFTP btw). I ended up with a pretty common configuration, which anyway didn't
> help:
>    socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=8192 SO_SNDBUF=8192
>    read raw = yes
>    write raw = yes
>    kernel oplocks = yes
>    max xmit = 2048
>    getwd cache = yes
> 
> 
> Changed the following parameters in /proc/sys/net:
>    1 -> 0 > ipv6/bindv6only (for Samba but it was bound already and I couldn't
> reboot. SFTP tests were done in IPv4 though)
>    1 -> 0 > tcp_sack
>    1 -> 0 > tcp_dsack
>    0 -> 1 > tcp_low_latency
>    2 -> 0 > tcp_frto
>    cubic -> reno > tcp_congestion_control
>    3 -> 6 > tcp_retries1
>    3 -> 10 > tcp_tso_win_divisor
>    1 -> 0 > tcp_slow_start_after_idle
> 
> 
> Then I disabled the following offloads using ethtool:
>    rx-checksumming on --> off
>    tx-checksumming on --> off
>    scatter-gather on --> off
>    tcp-segmentation-offload: on --> off
>    generic-segmentation-offload: on --> off
> 
> Nothing helped.
> 
> 
> This is the SFTP socket when it's running (Server side):
> # cat /proc/net/tcp
>   2: 0F12A8C0:0016 2612A8C0:C0C6 01 00000000:00000000 02:000AB698 00000000
> 0        0 267816 4 ffff88084b1e6180 24 4 1 11 -1
> 
> Later on, when it has been stuck for a while:
> # cat /proc/net/tcp
>   2: 0F12A8C0:0016 2612A8C0:C0C6 01 00000000:00000000 02:0002AC6B 00000000
> 0        0 267816 2 ffff88084b1e6180 27 4 1 21 -1
> (numbers reference:
> http://www.mjmwired.net/kernel/Documentation/networking/proc_net_tcp.txt )
>
> as you can see from the last values, the timeout and window have enlarged a
> bit, but nothing dramatic. Apparently they are very similar.


This end has no data to send (the numbers after " 01 ", first one is the 
size of the send queue), so it is correct behavior from TCP side.
 
> This is another SFTP stall after I disabled the offloading via ethtool:
> (different transfer, different socket from above)
>  2: 0F12A8C0:0016 2512A8C0:09BC 01 00000000:00000000 02:000602F4 00000000
> 0        0 5301100 2 ffff88084ac02080 28 4 0 13 -1
> so the values are similar again, apart the ack.pingpong which has become 0
> (something to do with offloading?)

Likewise, no data to send here either at the moment you took this snippet.

> This night I decided to actually tcpdump some traffic at the server side. This
> dump has a chance of being incorrect/misleading because the client I could use
> was already transferring other stuff to the server via Samba. I used SFTP
> protocol for this test but due to the simultaneous Samba transfer I could not
> obtain a long SFTP hangup. The longest hangup I obtained is 1 minute. I hope
> this is really a manifestation of the bug in question, and not a simple
> temporary starvation at server side.
> 
> Here is the relevant excerpt:
> 00:54:36.303031 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 12309129:12309229, ack 997119732, win 65535, length 100
> 00:54:36.303216 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 997119732:997120040, ack 12309229, win 64871, length 308
> 00:54:36.303450 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 12309229:12309297, ack 997120040, win 65535, length 68
> 00:54:36.303566 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 997120040:997120228, ack 12309297, win 64803, length 188
> 00:54:36.303645 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 12309297:12309365, ack 997120228, win 65535, length 68
> 00:54:36.304414 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 997120228:997120432, ack 12309365, win 64735, length 204
> 00:54:36.345583 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 997120432, win 65535, length 0       *** PROBLEM IS HERE ***
> 00:55:27.171687 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 12309365:12309425, ack 997120432, win 65535, length 60
> 00:55:27.342682 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 997120432:997120944, ack 12309425, win 64675, length 512
> 00:55:27.342726 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 997120944, win 65535, length 0
> 00:55:27.342775 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [.], seq
> 997120944:997122404, ack 12309425, win 64675, length 1460
> 00:55:27.342787 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 997122404, win 65535, length 0
> 00:55:27.342792 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 997122404:997122612, ack 12309425, win 64675, length 208
> 00:55:27.342795 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 997122612, win 65535, length 0
> 
> It seems to me the TCP stack at the Server returns an ack to the client but
> waits 1 minutes before being able to give the further 60 bytes to the client
> (these are application-generated bytes, right?) which are probably needed at
> client-side for the functioning of the SFTP protocol.
> I suspect the SFTP-server application receives the last client data just at
> the end of the minute of delay (at 00:55:27.1something)
> Probably if the dump was made without concomitant Samba activity, the
> sftp-server application would have never received such last client bytes and
> that's why the hangup lasts forever.
> However the TCP socket remains established because the TCP stack works and
> acks are sent properly
> Could this be a good interpretation of the bug?
> 
> I don't know what are those 60 bytes though, can you confirm is it a part of
> the SFTP or SSH protocol?
> 
> I observed other similar micro-hangups:
> ...
> 00:50:59.210115 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 3346661444, win 65535, length 0
> 00:51:14.137978 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 7757313:7757457, ack 3346661444, win 65535, length 144
> ...
> 00:52:02.495785 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 8612969:8613037, ack 3711382248, win 65535, length 68
> 00:52:02.507752 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 3711382248:3711382452, ack 8613037, win 65399, length 204
> 00:52:02.545585 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 3711382452, win 65535, length 0
> 00:52:14.568489 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 8613037:8613097, ack 3711382452, win 65535, length 60
> 00:52:14.572162 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [P.], seq
> 3711382452:3711382964, ack 8613097, win 65339, length 512
> 00:52:14.572184 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 3711382964, win 65535, length 0
> 00:52:14.572312 IP 192.168.18.37.3418 > 192.168.18.15.22: Flags [.], seq
> 3711382964:3711384424, ack 8613097, win 65339, length 1460
> ...
> 00:55:30.464957 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 1040635920, win 65535, length 0
> 00:55:32.230205 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 12410749:12410809, ack 1040635920, win 65535, length 60
> ...
> 00:55:54.404027 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 1362376840, win 65535, length 0
> 00:56:08.349224 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 13157437:13157497, ack 1362376840, win 65535, length 60
> ...
> 00:56:28.494333 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [.], ack
> 1424904580, win 65535, length 0
> 00:56:31.450352 IP 192.168.18.15.22 > 192.168.18.37.3418: Flags [P.], seq
> 13307333:13307393, ack 1424904580, win 65535, length 60
> 
> Update: actually I just found a disk performance problem at Server side so
> that the disks couldn't go faster than 70MB/sec (stripe_cache_size was too low
> and readahead was too low), and I'm thinking if this could be the reason for
> indeed triggering the bug: if there was a starvation of the TCP/IP stack or
> network card driver at the wrong time (because i.e. the CPUs are doing
> something else like accessing the disks), could that prevent the 60-bytes
> packet from ever being transmitted...? (in case of no other incoming ethernet
> packets which can give the TCPIP or e1000 driver a second chance to run)

...I'd next try strace the sftp server to see what it was doing 
during the stall.

-- 
 i.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel

Re: [E1000-devel] TCP sockets stalling - help! (long)

Reply via email to