On Sat, 2026-03-14 at 11:38 -0400, Trond Myklebust wrote: > Hi Salvatore, > > On Sat, 2026-03-14 at 13:23 +0100, Salvatore Bonaccorso wrote: > > Control: forwarded -1 > > https://lore.kernel.org/regressions/[email protected] > > Control: tags -1 + upstream > > > > Hi Trond, hi Anna > > > > In Debian we got reports of a NFS client regression where large > > rsize/wsize (1MB) causes EIO after the commit 2b092175f5e3 ("NFS: Fix > > inheritance of the block sizes when automounting") and its backports > > to the stable series. The report in full is at: > > https://bugs.debian.org/1128834 > > > > Maik reported: > > > after upgrading from Linux 6.1.158 to 6.1.162, NFS client writes > > > fail with input/output errors (EIO). > > > > > > Environment: > > > - Debian Bookworm > > > - Kernel: 6.1.0-43-amd64 (6.1.162-1) > > > - NFSv4.2 (also reproducible with 4.1) > > > - Default mount options include rsize=1048576,wsize=1048576 > > > > > > Reproducer: > > > dd if=/dev/zero of=~/testfile bs=1M count=500 > > > or > > > dd if=/dev/zero of=~/testfile bs=4k count=100000 > > > > > > On different computers and VMs! > > > > > > > > > Result: > > > dd: closing output file: Input/output error > > > > > > Workaround: > > > Mount with: > > > rsize=65536,wsize=65536 > > > > > > With reduced I/O size, the issue disappears completely. > > > > > > Impact: > > > - File writes fail (file >1M) > > > - KDE Plasma crashes due to corrupted cache/config writes > > > > > > The issue does NOT occur on kernel 6.1.0-42 (6.1.158). > > > > I was not able to reproduce the problem, and it turned out that it > > seems to be triggerable when on NFS server side a Dell EMC (Isilion) > > system was used. So the issue was not really considered initially as > > beeing "our" issue. > > > > Valentin SAMIR, a second user affected, did as well report the issue > > to Dell, and Dell seems to point at a client issue instead. Valentin > > writes: > > > > > We are facing the same issue. Dell seems to point to a client > > > issue: > > > The kernel treats the max size as the nfs payload max size whereas > > > OneFs treat the max size as the overall compound packet max size > > > (everything related to NFS in the call). Hence when OneFS receives > > > a > > > call with a payload of 1M, the overall NFS packet is slightly > > > bigger > > > and it returns an NFS4ERR_REQ_TOO_BIG. > > > > > > So the question is: should max req size/max resp size be treated as > > > the > > > nfs payload max size or the whole nfs packet max size? > > > > His reply in https://bugs.debian.org/1128834#55 contains a quote from > > the response Valentin got from Dell, I'm full quoting it here for > > easier followup in case needed: > > > > > I have been looking at the action plan output we captured. > > > Specifically around when you first mounted and then repro'ed the > > > error. > > > > > > Looking at the pcap we gathered, firstly, lets concentrate on the > > > "create session" calls between Client / Node. > > > Here we can these max sizes advertised - per screenshot. > > > > > > > > > Frame 17: 306 bytes on wire (2448 bits), 306 bytes captured (2448 > > > bits) > > > Ethernet II, Src: SuperMicroCo_1d:7d:b2 (ac:1f:6b:1d:7d:b2), Dst: > > > MellanoxTech_bd:8c:7a (c4:70:bd:bd:8c:7a) > > > Internet Protocol Version 4, Src: 172.22.1.132, Dst: 172.22.16.29 > > > Transmission Control Protocol, Src Port: 810, Dst Port: 2049, Seq: > > > 613, Ack: 277, Len: 240 > > > Remote Procedure Call, Type:Call XID:0x945b7e1d > > > Network File System, Ops(1): CREATE_SESSION > > > [Program Version: 4] > > > [V4 Procedure: COMPOUND (1)] > > > Tag: <EMPTY> > > > minorversion: 2 > > > Operations (count: 1): CREATE_SESSION > > > Opcode: CREATE_SESSION (43) > > > clientid: 0x36adef626e919bf4 > > > seqid: 0x00000001 > > > csa_flags: 0x00000003, CREATE_SESSION4_FLAG_PERSIST, > > > CREATE_SESSION4_FLAG_CONN_BACK_CHAN > > > csa_fore_chan_attrs > > > hdr pad size: 0 > > > max req size: 1049620 > > > max resp size: 1049480 > > > max resp size cached: 7584 > > > max ops: 8 > > > max reqs: 64 > > > csa_back_chan_attrs > > > hdr pad size: 0 > > > max req size: 4096 > > > max resp size: 4096 > > > max resp size cached: 0 > > > max ops: 2 > > > max reqs: 16 > > > cb_program: 0x40000000 > > > flavor: 1 > > > stamp: 2087796144 > > > machine name: srv-transfert.ad.phedre.fr > > > uid: 0 > > > gid: 0 > > > [Main Opcode: CREATE_SESSION (43)] > > > > > > > > > And the Node responds, as expected confirming the max size of > > > 1048576. > > > > > > > > > Frame 19: 194 bytes on wire (1552 bits), 194 bytes captured (1552 > > > bits) > > > Ethernet II, Src: MellanoxTech_bd:8c:7a (c4:70:bd:bd:8c:7a), Dst: > > > IETF-VRRP-VRID_3f (00:00:5e:00:01:3f) > > > Internet Protocol Version 4, Src: 172.22.16.29, Dst: 172.22.1.132 > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 810, Seq: > > > 321, Ack: 853, Len: 128 > > > Remote Procedure Call, Type:Reply XID:0x945b7e1d > > > Network File System, Ops(1): CREATE_SESSION > > > [Program Version: 4] > > > [V4 Procedure: COMPOUND (1)] > > > Status: NFS4_OK (0) > > > Tag: <EMPTY> > > > Operations (count: 1) > > > Opcode: CREATE_SESSION (43) > > > Status: NFS4_OK (0) > > > sessionid: f49b916e62efad36f200000006000000 > > > seqid: 0x00000001 > > > csr_flags: 0x00000002, > > > CREATE_SESSION4_FLAG_CONN_BACK_CHAN > > > csr_fore_chan_attrs > > > hdr pad size: 0 > > > max req size: 1048576 > > > max resp size: 1048576 > > > max resp size cached: 7584 > > > max ops: 8 > > > max reqs: 64 > > > csr_back_chan_attrs > > > hdr pad size: 0 > > > max req size: 4096 > > > max resp size: 4096 > > > max resp size cached: 0 > > > max ops: 2 > > > max reqs: 16 > > > [Main Opcode: CREATE_SESSION (43)] > > > > > > > > > Now if we look later on in the sequence when the Client sends the > > > write request to the Node - we see in the frame, the max size is as > > > expected 1048576 > > > > > > > > > Frame 747: 1998 bytes on wire (15984 bits), 1998 bytes captured > > > (15984 bits) > > > Ethernet II, Src: SuperMicroCo_1d:7d:b2 (ac:1f:6b:1d:7d:b2), Dst: > > > MellanoxTech_bd:8c:7a (c4:70:bd:bd:8c:7a) > > > Internet Protocol Version 4, Src: 172.22.1.132, Dst: 172.22.16.29 > > > Transmission Control Protocol, Src Port: 810, Dst Port: 2049, Seq: > > > 1054149, Ack: 6009, Len: 1932 > > > [345 Reassembled TCP Segments (1048836 bytes): #84(1448), > > > #85(5792), > > > #87(5792), #89(1448), #90(1448), #92(4344), #94(4344), #96(2896), > > > #98(1448), #99(2896), #101(4344), #103(4344), #105(1448), > > > #106(1448), > > > #108(2896), #110(1448), #111(2896)] > > > Remote Procedure Call, Type:Call XID:0xb45b7e1d > > > Network File System, Ops(4): SEQUENCE, PUTFH, WRITE, GETATTR > > > [Program Version: 4] > > > [V4 Procedure: COMPOUND (1)] > > > Tag: <EMPTY> > > > minorversion: 2 > > > Operations (count: 4): SEQUENCE, PUTFH, WRITE, GETATTR > > > Opcode: SEQUENCE (53) > > > Opcode: PUTFH (22) > > > Opcode: WRITE (38) > > > StateID > > > offset: 0 > > > stable: FILE_SYNC4 (2) > > > Write length: 1048576 > > > Data: <DATA> > > > Opcode: GETATTR (9) > > > [Main Opcode: WRITE (38)] > > > > > > > > > However we then see the Node reply a short time later with (as > > > below) > > > REQ_TOO_BIG - meaning the max size has been exceeded. > > > > > > Frame 749: 114 bytes on wire (912 bits), 114 bytes captured (912 > > > bits) > > > Ethernet II, Src: MellanoxTech_bd:8c:7a (c4:70:bd:bd:8c:7a), Dst: > > > IETF-VRRP-VRID_3f (00:00:5e:00:01:3f) > > > Internet Protocol Version 4, Src: 172.22.16.29, Dst: 172.22.1.132 > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 810, Seq: > > > 6009, Ack: 1056081, Len: 48 > > > Remote Procedure Call, Type:Reply XID:0xb45b7e1d > > > Network File System, Ops(1): SEQUENCE(NFS4ERR_REQ_TOO_BIG) > > > [Program Version: 4] > > > [V4 Procedure: COMPOUND (1)] > > > Status: NFS4ERR_REQ_TOO_BIG (10065) > > > Tag: <EMPTY> > > > Operations (count: 1) > > > Opcode: SEQUENCE (53) > > > Status: NFS4ERR_REQ_TOO_BIG (10065) > > > [Main Opcode: SEQUENCE (53)] > > > > > > > > > Why is this? > > > > > > The reason for this seems to be related to the Client. > > > > > > From the Cluster side, the max rsize/wsize is the overall compound > > > packet max size (everything related to NFS in the call) > > > > > > So for example with a compound call in nfsv4.2 - this might include > > > the below type detail which does not exceed the overall size > > > 1048576: > > > > > > [ > > > COMPOUND header > > > SEQUENCE .... > > > PUTFH ... > > > WRITE header > > > WRITE payload > > > ] (overall) < 1mb > > > > > > > > > However the Client instead uses r/wsize from mount option, as a > > > limit > > > for the write payload. > > > > > > So with the same example > > > COMPOUND header > > > SEQUENCE .... > > > PUTFH ... > > > WRITE header > > > > > > [ > > > WRITE payload > > > ] (write) < 1mb > > > > > > But overall this ends up being 1mb + all the overhead of write > > > header, compound header, putfh etc > > > Puts it over the channel limit of 1048576 and hence the error > > > returned. > > > > > > So it seems here the Client ignores that value and insists on the > > > WRITE with a payload == wszie; which in total with WRITE overhead > > > and > > > all other requests in COMPOUND (PUTFH, etc) exceeds maxrequestsize, > > > which prompts NFS4ERR_REQ_TOO_BIG. > > > > > > > > > And as you can see, once you reduce the size within the mount > > > options > > > on the Client side, it no longer exceeds its limits. > > > Meaning you don't get the I/O error. > > > > So question, are we behaving here correctly or is it our Problem, or > > is the > > issue still considered on Dell's side? > > > > #regzbot introduced: 2b092175f5e301cdaa935093edfef2be9defb6df > > #regzbot monitor: https://bugs.debian.org/1128834 > > > > How to proceeed from here? > > > The Linux NFS client uses the 'maxread' and 'maxwrite' attributes (see > RFC8881 Sections 5.8.2.20. and 5.8.2.21.) to decide how big a payload > to request/send to the server in a READ/WRITE COMPOUND. > > If Dell's implementation is returning a size of 1MB, then the Linux > client will use that value. It won't cross check with the max request > size, because it assumes that since both values derive from the server, > there will be no conflict between them.
This seems like a wrong interpretation to me. Servers use the max_request_size to properly size their receive buffers, and the client is responsible for adhering to that value. I don't think you can stick a bunch of operations in a request compound and then put a huge WRITE at the end that blows out max_request_size, and expect the server to be OK with that. ISTM the client should clamp the length down to something shorter that allows the request to fit. Maybe drop the last folio and force another request? Performance would suck but it would work. All that said, the server in this case isn't sizing max_request_size with enough overhead for the client to actually achieve a full 1M write, which is just dumb. Dell should fix that. -- Jeff Layton <[email protected]>

