Hi,
Last week, I upgraded 3 machines from Bookworm to Trixie. The same
export is mounted on these machines. Ever since, the amount of NFS RPCs
per client has increased from 4-9k to 17-28k per 30 seconds. The user
workload has not changed.
The kernel was upgraded from 6.12.22-1~bpo12+1 (Bookworm backports) to
6.18.9-1~bpo13+1 (Trixie backports). Using the previous kernel from
Bookworm (on Trixie) does not help. I also gave 6.12.73+deb13-amd64
(stable, not backports) a shot; no change.
Many more DelegReturn / Close / OpenNoattr (Linux shorthand[1] for
SEQUENCE + PUTFH + OPEN) / TestStateID RPCs are done, but user workload
has not changed. The amount of StatFs and Getattr calls have decreased
by about 8,75x and 1,3x respectively.
Here is two graphs showing requests:
https://nextcloud.cyberfusion.nl/s/8De2coZyjaZgZk2 (password
'Kz55T8imXi'). The massive increase immediately follows the Trixie
upgrade. (The mailing list seems to not pick up my message when
attaching the image.)
The NFS server is on Bookworm (6.1.0-41-amd64). There is NO increase on
(still Bookworm) clients using the same NFS server.
I considered that memory allocation might have changed. I.e. less cache
being available, naturally causing more requests. That is not the case;
the amount of available RAM is about the same as pre-upgrade.
Have others seen this behaviour? Anyone have an idea as to where this
increase comes from? I haven't yet thought of what in userspace might
affect this.
With kind regards,
William David Edwards
[1]:
https://www.fsl.cs.stonybrook.edu/docs/nfs4perf/nfs4perf-microscope.pdf