The problem affects also transfers from RDS instances - it prevents for example creating an export file using pg_dump from an RDS Postgres database.
The problem seems to be caused by a sudden collapse of downloader TCP receive window to "1" (usually with scale 7, so 1*2^7=128 bytes) after transferring several tens of GB of data over a single connection. The TCP receive window never recovers. Analysis of a single stalled transfer from RDS PostgreSQL: tcpdump -nn -vvv -r pcap --dont-verify-checksums; done | xz -1 > pcap.txt.xz $ xzgrep wscale pcap.txt.xz 10.16.14.237.33578 > 10.16.10.102.5432: Flags [S], seq 3273208485, win 26883, options [mss 8961,sackOK,TS val 1284903196 ecr 0,nop,wscale 7], length 0 10.16.10.102.5432 > 10.16.14.237.33578: Flags [S.], seq 2908863056, ack 3273208486, win 28960, options [mss 1460,sackOK,TS val 120076048 ecr 1284903196,nop,wscale 10], length 0 So the window scale for this TCP connection is 7, so each "win N" values have to be multiplied by 2^7=128. The window size TCP parameter at the beginning of the file is for example 852, which means after scaling 852*2^7=109056 bytes $ xzgrep '10.16.14.237.33578' pcap.txt.xz | head -100000 | tail -1 10.16.14.237.33578 > 10.16.10.102.5432: Flags [.], seq 2256009, ack 201198741, win 852, options [nop,nop,TS val 1284911021 ecr 120078004], length 0 But near the end of the file it is 1, which means after scaling 1*2^7=128 bytes: $ xzcat pcap.txt.xz | tail -1000 | grep '10.16.14.237.33578' | head -1 10.16.14.237.33578 > 10.16.10.102.5432: Flags [.], seq 2266022, ack 3742538664, win 1, options [nop,nop,TS val 1286238401 ecr 120409852], length 0 And indeed the RDS server is sending 128 bytes of data and waiting for confirmation before sending another 128: 13:47:27.170534 IP (tos 0x0, ttl 255, id 11479, offset 0, flags [DF], proto TCP (6), length 180) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [P.], seq 3742538664:3742538792, ack 2266022, win 174, options [nop,nop,TS val 120409903 ecr 1286238401], length 128 13:47:27.170542 IP (tos 0x0, ttl 64, id 28256, offset 0, flags [DF], proto TCP (6), length 52) 10.16.14.237.33578 > 10.16.10.102.5432: Flags [.], seq 2266022, ack 3742538792, win 1, options [nop,nop,TS val 1286238605 ecr 120409903], length 0 13:47:27.374539 IP (tos 0x0, ttl 255, id 11480, offset 0, flags [DF], proto TCP (6), length 180) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [P.], seq 3742538792:3742538920, ack 2266022, win 174, options [nop,nop,TS val 120409954 ecr 1286238605], length 128 The switch from "win 600" (76800 bytes) to "win 1" (128 bytes) is sudden and I have no idea what could have caused it: 13:33:26.230782 IP (tos 0x0, ttl 64, id 24124, offset 0, flags [DF], proto TCP (6), length 52) 10.16.14.237.33578 > 10.16.10.102.5432: Flags [.], seq 2266022, ack 3741933728, win 600, options [nop,nop,TS val 1285397677 ecr 120199669], length 0 13:33:26.230868 IP (tos 0x0, ttl 255, id 7295, offset 0, flags [DF], proto TCP (6), length 4396) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [.], seq 3741933728:3741938072, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 4344 13:33:26.230918 IP (tos 0x0, ttl 255, id 7298, offset 0, flags [DF], proto TCP (6), length 43492) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [.], seq 3741938072:3741981512, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 43440 13:33:26.230932 IP (tos 0x0, ttl 255, id 7328, offset 0, flags [DF], proto TCP (6), length 13708) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [P.], seq 3741981512:3741995168, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 13656 13:33:26.230948 IP (tos 0x0, ttl 255, id 7338, offset 0, flags [DF], proto TCP (6), length 2948) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [.], seq 3741995168:3741998064, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 2896 13:33:26.230969 IP (tos 0x0, ttl 255, id 7340, offset 0, flags [DF], proto TCP (6), length 5348) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [P.], seq 3741998064:3742003360, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 5296 13:33:26.231759 IP (tos 0x0, ttl 255, id 7344, offset 0, flags [DF], proto TCP (6), length 4396) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [.], seq 3742003360:3742007704, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 4344 13:33:26.231775 IP (tos 0x0, ttl 255, id 7347, offset 0, flags [DF], proto TCP (6), length 2876) 10.16.10.102.5432 > 10.16.14.237.33578: Flags [.], seq 3742007704:3742010528, ack 2266022, win 174, options [nop,nop,TS val 120199669 ecr 1285397677], length 2824 13:33:26.233238 IP (tos 0x0, ttl 64, id 24125, offset 0, flags [DF], proto TCP (6), length 52) 10.16.14.237.33578 > 10.16.10.102.5432: Flags [.], seq 2266022, ack 3742010528, win 1, options [nop,nop,TS val 1285397679 ecr 120199669], length 0 ---- There's one changelog entry which seems to touch TCP receive window calculations: tcp: "avoid integer overflows in tcp_rcv_space_adjust()". I don't know if this is the change which caused this. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1796469 Title: aws s3 cp --recursive hangs on the last file on a large file transfer to instance Status in linux-aws package in Ubuntu: Confirmed Bug description: aws s3 cp --recursive hangs on the last file on a large transfer to an instance I have confirmed that this works on version Linux/4.15.0-1021-aws aws cli version aws-cli/1.16.23 Python/2.7.15rc1 Linux/4.15.0-1023-aws botocore/1.12.13 Ubuntu version Description: Ubuntu 18.04.1 LTS Release: 18.04 eu-west-1 - ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180912 - ami-00035f41c82244dab Package version linux-aws: Installed: 4.15.0.1023.23 Candidate: 4.15.0.1023.23 Version table: *** 4.15.0.1023.23 500 500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 100 /var/lib/dpkg/status 4.15.0.1007.7 500 500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1796469/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp