Could be interesting to find out whether on a m1.small the issue does not occur 
(although that still could be resulting from other differences in the setup 
than mtu). Not sure how AWS manages to cause the instance to come up with a 
different mtu either. In my experiments I had a normal bridge on the host set 
to 9000 and the guest still had 1500. Though I do not know how the network is 
set up in EC2 in detail (could be openvswitch).
Generally the issue is that something seems to cause packets with a large data 
buffer. One slot in the xen-net driver is a 4k page. The limit is 18 slots. 
Anything above that causes the observed message and the packet to be dropped. 
The host side would have another limit of (usually) 20 slots on which it would 
assume a malicious guest and disrupts the connection. But since the guest drops 
at 17 or above the host should never see that number.
Unfortunately I am not that deeply understanding the network code, so I will 
have to ask upstream. As far as I understand a socket buffer can consist of of 
multiple fragments (kind of a scatter gather list). There is a definition in 
the code that sets a limit to the number of fragments based on a maximum frame 
size of 64K. This results in 17 frags (for 4K pages that is 16 + 1 to handle 
data not starting at page boundary). The Xen driver counts the length of the 
memory area in all frags (if data in a frag starts at an offset that is added, 
the code does that for every frag, the question would be whether in theory each 
frag would be allowed to have an offset because that might add up to more than 
one page). To the number of pages needed for the frags, the driver then adds 
the number of pages (can that be more than one?) needed for the header. If that 
is bigger than 18 (17 for frag + 1 for header?) the rides the rocket error 
happens.
This leaves a few question marks for me: the memory associated with a frag can 
be a compound page, so I would think that the length might be greater than 4K. 
I have no clue, yet, how compound pages exactly come into play. Is the 64K 
limit still enforced by a limit of the number of frags? Can each frag data 
begin at some offset (and end with more than one page of overall overhead)? 
Apparently the header can start at some offset, too. So worst case (assuming 
header length to be less than 4K), if the offset is quite big, that could end 
up requiring 2 pages. Then if the frag data happens to use up its 17 pages 
limit, we just would end up hitting the 19 pages failure size.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1317811

Title:
  Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x
  slots"

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  Running Ubuntu 14.04 LTS on EC2, we see a lot of the following in the
  kernel log:

      xen_netfront: xennet: skb rides the rocket: 19 slots

  Each of these messages corresponds to a dropped TX packet, and
  eventually causes our application's connections to break and timeout.

  The problem appears when network load increases. We have Node.js
  processes doing pubsub with a Redis server, and these are most visibly
  affected, showing frequent connection loss. The processes talk to each
  other using the private addresses EC2 allocates to the machines.

  Notably, the default MTU on the network interface seems to have gone
  up from 1500 on 13.10, to 9000 in 14.04 LTS. Reducing the MTU back to
  1500 seems to drastically reduce dropped packets. (Can't say for
  certain if it completely eliminates the problem.)

  The machines we run are started from ami-896c96fe.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-24-generic 3.13.0-24.46
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:01 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:01 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Date: Fri May  9 09:11:18 2014
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:54 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:54 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg: [   24.724129] init: plymouth-upstart-bridge main process 
ended, respawning
  DistroRelease: Ubuntu 14.04
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  Package: linux (not installed)
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty ec2-images
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to