ACK

3.13.0-35-generic #62~lp1349768v5v201408250842 appears to resolve the
issue completely on my actual test setup as well. No TFTP stalls,
dnsmasq EPERM errors or dmesg errors, and ftrace doesn't show any calls
to ipv6_find_hdr.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1349768

Title:
  kernel 3.13.0-32 ipvs "IPv6 header not found" related to UDP socket
  sendto() EPERM errors

Status in “linux” package in Ubuntu:
  In Progress
Status in “linux” source package in Trusty:
  In Progress
Status in “linux” source package in Utopic:
  In Progress

Bug description:
  I have an Ubuntu 14.04 host that I am using as both a keepalived/ipvs
  loadbalancer and dnsmasq server for pxebooting servers.

  After updating linux-image 3.13.0-29.53 -> 3.13.0-32.57 I noticed that
  dnsmasq-tftp stopped working. pxeboot clients would hang on the
  "Loading ..../linux" TFTP transfer, with the transfer stalling roughly
  ~1000 blocks into the transfer:

  10:30:51.011728 IP 10.1.1.2.43540 > 10.1.12.1.49165: UDP, length 1412
  10:30:51.011924 IP 10.1.12.1.49165 > 10.1.1.2.43540: UDP, length 4
  10:30:51.012012 IP 10.1.1.2.43540 > 10.1.12.1.49165: UDP, length 1412
  10:30:51.012183 IP 10.1.12.1.49165 > 10.1.1.2.43540: UDP, length 4

  stracing dnsmasq I noticed something very odd: sendto() on the
  socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) would suddenly start
  persistently returning EPERM in mid-transfer, even when dnsmasq
  continued to periodically retry:

  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 1 (in [17], 
left {0, 249834})
  recvfrom(17, "\0\4\3\352", 4096, 0, NULL, NULL) = 4
  lseek(16, 1410816, SEEK_SET) = 1410816
  read(16, 
"\25\306\345f\2{\r\4)W\276\32\336q\252_\230q\213\341U\354\25\374k7\243\32\221X+\v"...,
 1408) = 1408
  sendto(17, 
"\0\3\3\353\25\306\345f\2{\r\4)W\276\32\336q\252_\230q\213\341U\354\25\374k7\243\32"...,
 1412, 0, {sa_family=AF_INET, sin_port=htons(49165), 
sin_addr=inet_addr("10.1.11.3")}, 16) = 1412
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 1 (in [17], 
left {0, 249839})
  recvfrom(17, "\0\4\3\353", 4096, 0, NULL, NULL) = 4
  lseek(16, 1412224, SEEK_SET) = 1412224
  read(16, "*\360 
<C\363l\320:\256~\307\236\26P\323\274%\260\362\341&\232\r\243\370\224\277\221\\\307\372"...,
 1408) = 1408
  sendto(17, "\0\3\3\354*\360 
<C\363l\320:\256~\307\236\26P\323\274%\260\362\341&\232\r\243\370\224\277"..., 
1412, 0, {sa_family=AF_INET, sin_port=htons(49165), 
sin_addr=inet_addr("10.1.11.3")}, 16) = -1 EPERM (Operation not permitted)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  select(18, [4 5 6 7 8 9 10 11 12 15 17], [], [], {0, 250000}) = 0 (Timeout)
  lseek(16, 1412224, SEEK_SET) = 1412224
  read(16, "*\360 
<C\363l\320:\256~\307\236\26P\323\274%\260\362\341&\232\r\243\370\224\277\221\\\307\372"...,
 1408) = 1408
  sendto(17, "\0\3\3\354*\360 
<C\363l\320:\256~\307\236\26P\323\274%\260\362\341&\232\r\243\370\224\277"..., 
1412, 0, {sa_family=AF_INET, sin_port=htons(49165), 
sin_addr=inet_addr("10.1.11.3")}, 16) = -1 EPERM (Operation not permitted)

  This was with all iptables rules unloaded (so no OUTPUT -j DENY) and
  apparmor profiles torn down.

  I also noticed the following dmesgs appearing at roughly similar times
  to the tftp transfers getting stuck (although not coinciding exactly
  with the stall):

  [70325.516724] IPv6 header not found

  The error pointed to ipvs (which I am using on the same host as an IPv4 NAT 
loadbalancer):
  http://archive.linuxvirtualserver.org/html/lvs-devel/2012-08/msg00018.html
  http://comments.gmane.org/gmane.comp.linux.lvs.devel/3614

  I then tore down the ipvs rules (service keepalived stop) and unloaded
  the modules (rmmod ip_vs_rr ip_vs), and the issue resolved itself -
  the stalled dnsmasq-tftp transfer resumed!

  This seems to be reproducible, i.e. modprobing ip_vs and starting
  keepalived will cause dnsmasq-tftp to stall again, and
  stopping/unloading will resume.

  This seems to happen reproducibly on boot with -32 and -30. This does NOT 
seem to happen with 3.13.0-29 which I was using up until now.
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jul 29 13:43 seq
   crw-rw---- 1 root audio 116, 33 Jul 29 13:43 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.2
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 14.04
  HibernationDevice: RESUME=/dev/mapper/catcp2-swap
  InstallationDate: Installed on 2014-06-03 (56 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 
(20140416.2)
  MachineType: Dell Inc. PowerEdge R410
  Package: linux-image-3.13.0-32-generic 3.13.0-32.57
  PackageArchitecture: amd64
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-32-generic 
root=/dev/mapper/hostname-root ro console=ttyS1,115200n8 console=tty0 
nomdmonddf nomdmonisw
  ProcVersionSignature: Ubuntu 3.13.0-32.57-generic 3.13.11.4
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-32-generic N/A
   linux-backports-modules-3.13.0-32-generic  N/A
   linux-firmware                             1.127.5
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty
  Uname: Linux 3.13.0-32-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 07/30/2013
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 1.12.0
  dmi.board.name: 01V648
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A03
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr1.12.0:bd07/30/2013:svnDellInc.:pnPowerEdgeR410:pvr:rvnDellInc.:rn01V648:rvrA03:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R410
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to