Public bug reported:
About once a week the NFS mounts on our eda servers (the nfs clients)
slow to a crawl for a large number of workloads. strace reveals that
openat system calls for files in the mounts take between 0.1 and 0.3
seconds this can be seen directly by touching any file in the homedir
(this normally takes 0.001-0.002 seconds). This causes obvious problems,
an IPython instance in a conda environment installed in NFS takes around
2-3 minutes to start for example.
I originally thought that this was a manifestation of #2009325, but we
are on 5.15.0-89 and have seen this problem across several kernel
versions.
Additionally we don't see access rpc calls but we do see loads of test_stateid
calls:
- bursts of 300 or so periodically every couple seconds while idle
- 380,452 test_stateid calls when an ipython instance is starting (zero
access calls in the same period, when at idle we see 10,267 test_stateid calls
over the same ~2:30 time period and again no access calls)
We're not using kerberos, just ip based authentication, our mount
options are:
exoserver.example.com:/volume1/homes/cudaa on /home/cudaa type nfs4
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx.xxx.xxx.219,local_lock=none,addr=xxx.xxx.xxx.146)
We see this on servers with both linux-image-5.15.0-89-generic and
linux-headers-5.15.0-83-generic, and have seen it across a few versions (we
have the hwe stack installed if that matters). I could upgrade them to 91 but
the issue clears after a reboot and we don't have a reliable way to reproduce
it so I'm going to keep one online for debugging.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: gdm 3334 F.... pulseaudio
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2022-04-05 (618 days ago)
InstallationMedia: Ubuntu 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223)
IwConfig:
lo no wireless extensions.
enp69s0 no wireless extensions.
enp68s0 no wireless extensions.
MachineType: ASUS System Product Name
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
ProcEnviron:
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/zsh
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-89-generic
root=UUID=ec5ca229-1bd1-461a-98fd-d0f4eba0dd26 ro fsck.mode=force
fsck.repair=yes
ProcVersionSignature: Ubuntu 5.15.0-89.99~20.04.1-generic 5.15.126
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No
PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-5.15.0-89-generic N/A
linux-backports-modules-5.15.0-89-generic N/A
linux-firmware 1.187.39
RfKill:
Tags: focal
Uname: Linux 5.15.0-89-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 03/04/2022
dmi.bios.release: 16.3
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1603
dmi.board.asset.tag: Default string
dmi.board.name: ROG ZENITH II EXTREME
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr1603:bd03/04/2022:br16.3:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREME:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: ASUS
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags: apport-collected focal
** Tags added: apport-collected focal
** Description changed:
About once a week the NFS mounts on our eda servers (the nfs clients)
slow to a crawl for a large number of workloads. strace reveals that
openat system calls for files in the mounts take between 0.1 and 0.3
seconds this can be seen directly by touching any file in the homedir
(this normally takes 0.001-0.002 seconds). This causes obvious problems,
an IPython instance in a conda environment installed in NFS takes around
2-3 minutes to start for example.
I originally thought that this was a manifestation of #2009325, but we
are on 5.15.0-89 and have seen this problem across several kernel
versions.
Additionally we don't see access rpc calls but we do see loads of
test_stateid calls:
- bursts of 300 or so periodically every couple seconds while idle
- 380,452 test_stateid calls when an ipython instance is starting (zero
access calls in the same period, when at idle we see 10,267 test_stateid calls
over the same ~2:30 time period and again no access calls)
We're not using kerberos, just ip based authentication, our mount
options are:
exoserver.example.com:/volume1/homes/cudaa on /home/cudaa type nfs4
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx.xxx.xxx.219,local_lock=none,addr=xxx.xxx.xxx.146)
- We see this on servers with both linux-image-5.15.0-89-generic and
- linux-headers-5.15.0-83-generic, and have seen it across a few versions
- (we have the hwe stack installed if that matters). I could upgrade them
- to 91 but the issue clears after a reboot and we don't have a reliable
- way to reproduce it so I'm going to keep one online for debugging.
+ We see this on servers with both linux-image-5.15.0-89-generic and
linux-headers-5.15.0-83-generic, and have seen it across a few versions (we
have the hwe stack installed if that matters). I could upgrade them to 91 but
the issue clears after a reboot and we don't have a reliable way to reproduce
it so I'm going to keep one online for debugging.
+ ---
+ ProblemType: Bug
+ ApportVersion: 2.20.11-0ubuntu27.27
+ Architecture: amd64
+ AudioDevicesInUse:
+ USER PID ACCESS COMMAND
+ /dev/snd/controlC0: gdm 3334 F.... pulseaudio
+ CasperMD5CheckResult: skip
+ DistroRelease: Ubuntu 20.04
+ InstallationDate: Installed on 2022-04-05 (618 days ago)
+ InstallationMedia: Ubuntu 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223)
+ IwConfig:
+ lo no wireless extensions.
+
+ enp69s0 no wireless extensions.
+
+ enp68s0 no wireless extensions.
+ MachineType: ASUS System Product Name
+ NonfreeKernelModules: nvidia_modeset nvidia
+ Package: linux (not installed)
+ ProcEnviron:
+ TERM=screen
+ PATH=(custom, no user)
+ LANG=en_US.UTF-8
+ SHELL=/bin/zsh
+ ProcFB: 0 EFI VGA
+ ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-89-generic
root=UUID=ec5ca229-1bd1-461a-98fd-d0f4eba0dd26 ro fsck.mode=force
fsck.repair=yes
+ ProcVersionSignature: Ubuntu 5.15.0-89.99~20.04.1-generic 5.15.126
+ PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No
PulseAudio daemon running, or not running as session daemon.
+ RelatedPackageVersions:
+ linux-restricted-modules-5.15.0-89-generic N/A
+ linux-backports-modules-5.15.0-89-generic N/A
+ linux-firmware 1.187.39
+ RfKill:
+
+ Tags: focal
+ Uname: Linux 5.15.0-89-generic x86_64
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups: N/A
+ _MarkForUpload: True
+ dmi.bios.date: 03/04/2022
+ dmi.bios.release: 16.3
+ dmi.bios.vendor: American Megatrends Inc.
+ dmi.bios.version: 1603
+ dmi.board.asset.tag: Default string
+ dmi.board.name: ROG ZENITH II EXTREME
+ dmi.board.vendor: ASUSTeK COMPUTER INC.
+ dmi.board.version: Rev 1.xx
+ dmi.chassis.asset.tag: Default string
+ dmi.chassis.type: 3
+ dmi.chassis.vendor: Default string
+ dmi.chassis.version: Default string
+ dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr1603:bd03/04/2022:br16.3:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREME:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
+ dmi.product.family: To be filled by O.E.M.
+ dmi.product.name: System Product Name
+ dmi.product.sku: SKU
+ dmi.product.version: System Version
+ dmi.sys.vendor: ASUS
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2046572
Title:
NFS latency spikes during openat syscalls associated with test_stateid
RPC command
Status in linux package in Ubuntu:
New
Bug description:
About once a week the NFS mounts on our eda servers (the nfs clients)
slow to a crawl for a large number of workloads. strace reveals that
openat system calls for files in the mounts take between 0.1 and 0.3
seconds this can be seen directly by touching any file in the homedir
(this normally takes 0.001-0.002 seconds). This causes obvious
problems, an IPython instance in a conda environment installed in NFS
takes around 2-3 minutes to start for example.
I originally thought that this was a manifestation of #2009325, but we
are on 5.15.0-89 and have seen this problem across several kernel
versions.
Additionally we don't see access rpc calls but we do see loads of
test_stateid calls:
- bursts of 300 or so periodically every couple seconds while idle
- 380,452 test_stateid calls when an ipython instance is starting (zero
access calls in the same period, when at idle we see 10,267 test_stateid calls
over the same ~2:30 time period and again no access calls)
We're not using kerberos, just ip based authentication, our mount
options are:
exoserver.example.com:/volume1/homes/cudaa on /home/cudaa type nfs4
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx.xxx.xxx.219,local_lock=none,addr=xxx.xxx.xxx.146)
We see this on servers with both linux-image-5.15.0-89-generic and
linux-headers-5.15.0-83-generic, and have seen it across a few versions (we
have the hwe stack installed if that matters). I could upgrade them to 91 but
the issue clears after a reboot and we don't have a reliable way to reproduce
it so I'm going to keep one online for debugging.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: gdm 3334 F.... pulseaudio
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2022-04-05 (618 days ago)
InstallationMedia: Ubuntu 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223)
IwConfig:
lo no wireless extensions.
enp69s0 no wireless extensions.
enp68s0 no wireless extensions.
MachineType: ASUS System Product Name
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
ProcEnviron:
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/zsh
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-89-generic
root=UUID=ec5ca229-1bd1-461a-98fd-d0f4eba0dd26 ro fsck.mode=force
fsck.repair=yes
ProcVersionSignature: Ubuntu 5.15.0-89.99~20.04.1-generic 5.15.126
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No
PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-5.15.0-89-generic N/A
linux-backports-modules-5.15.0-89-generic N/A
linux-firmware 1.187.39
RfKill:
Tags: focal
Uname: Linux 5.15.0-89-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 03/04/2022
dmi.bios.release: 16.3
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1603
dmi.board.asset.tag: Default string
dmi.board.name: ROG ZENITH II EXTREME
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr1603:bd03/04/2022:br16.3:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREME:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: ASUS
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046572/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp