To understand behaviour and reproduce case. Do you have the same issue if use Linux NFS v4.0 clients?
— Vitaliy Gusev > On 7 Mar 2025, at 14:05, Udo Grabowski (IMK) <udo.grabow...@kit.edu> wrote: > > oi151a9 # uname -a > SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc > > On 07/03/2025 11:28, Vitaliy Gusev wrote: >> Hi Udo! >> Could you give more information what is kernel codebase for your oi_151a9, >> probably name -a ? >> And could you specify more about setup where you didn’t see the problem. >> Vitaliy Gusev > > oi151a9 # uname -a > SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc > > Problem also occurs for recent clients (OI younger end of July 2024) . > Server was OI Dec 2024 and now Jan 2025. Have currently only seen this > for the flash array, but we don't torture our other machines that hard. > > Old oi_151a7/a9 servers and OI Hipster 2018-09 NFS servers had no > problems with these loads. Haven't had a newer NFS server OS version > in between that had to cope which this mass production, unfortunately, > so there's no narrower version bracket. > > >>> On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> >>> wrote: >>> >>> >>> >>> On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote: >>>>> On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> >>>>> wrote: >>>>> >>>>> On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote: >>>>>> Hi, >>>>>> for a while now, we see NFS problems quite often we've not seen >>>>>> before, here between an oi_151a9 client and a recent 2024:12:12 >>>>>> illumos-b7fe974ee3 server; starts with a file <null string>, then >>>>>> mixed up seqids, loosing track for several file openings: >>>>>> Mar 5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>>>> file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got >>>>>> NFS4ERR_BAD_SEQID. Last good seqid was 0 for operation . >>>>>> Mar 5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt >>>>>> 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID. Last >>>>>> good seqid was 1 for operation open. >>>>>> Mar 5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt >>>>>> 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last >>>>>> good seqid was 1 for operation open. >>>>>> ... >>>>> > This probably also has happened with new clients, but I'm yet not sure >>>>> > about that. It's happening often enough that it is significantly >>>>> > disturbing operations here. >>>>> >>>>> We now know that this definitely happens with recent clients, too. >>>> what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain >>>> mount and edit file and save has no problem… >>> >>> No, it's not that simple. We are running about 70 client processes >>> on different cluster machines hammering on two NFS servers >>> (here, a very new dual host all flash array from ZStor with 100 Gb/s >>> mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and >>> 3 in 11...) that sequence suddenly happens and gives I/O errors for >>> a couple of files. This started end of last year when we inaugurated >>> these servers, before we ran oi151a7 to a9 and Hipster 2016 servers >>> and had no such problems in years. >>> Messages are only seen on clients, no indications for the errors >>> on the servers. Snooping would be a hard thing to do, as we don't >>> know on which client they will occur and at what time ... >>> >>> Important server parameters (these are v4 tcp connections on port >>> 2049, doubled TCP_MAX_BUF, to no help): >>> nfs-props/servers integer 1024 >>> nfs-props/server_delegation astring off >>> nfs-props/listen_backlog integer 512 >>> nfs-props/mountd_listen_backlog integer 64 >>> nfs-props/mountd_max_threads integer 16 >>> nfs-props/server_versmax astring 4 >>> >>> set ncsize=0x100000 >>> set nfs:nfs3_nra=16 >>> set nfs:nfs3_bsize=0x100000 >>> set nfs:nfs3_max_transfer_size=0x100000 >>> set nfs:nfs4_max_transfer_size=0x100000 >>> set nfs:nfs4_bsize=0x100000 >>> set rpcmod:clnt_max_conns=8 >>> * not reachable via system, see default/inetinit and method net-init >>> *set ndd:tcp_recv_hiwat=0x100000 >>> *set ndd:tcp_xmit_hiwat=0x100000 >>> *set ndd:tcp_max_buf=0x400000 >>> *set ndd:tcp_cwnd_max=0x200000 >>> *set ndd:tcp_conn_req_max_q=1024 >>> *set ndd:tcp_conn_req_max_q0=4096 >>> set nfs:nfs4_bsize=0x100000 >>> TCP_RECV_HIWAT=1048576 >>> TCP_XMIT_HIWAT=1048576 >>> TCP_MAX_BUF=8388608 >>> TCP_STRONG_ISS=2 >>> TCP_CWND_MAX=4194304 >>> TCP_CONN_REQ_MAX_Q=2048 >>> TCP_CONN_REQ_MAX_Q0=4096 >>> >>> Client mount parameters: >>> /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24 >>> Flags: >>> vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600 >>> Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 >>> >>> ------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M850e464ee19602ef5ee9f03f Delivery options: https://illumos.topicbox.com/groups/discuss/subscription