Hi Udo! Could you give more information what is kernel codebase for your oi_151a9, probably name -a ?
And could you specify more about setup where you didn’t see the problem. — Vitaliy Gusev > On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote: > > > > On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote: >>> On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> >>> wrote: >>> >>> On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote: >>>> Hi, >>>> for a while now, we see NFS problems quite often we've not seen >>>> before, here between an oi_151a9 client and a recent 2024:12:12 >>>> illumos-b7fe974ee3 server; starts with a file <null string>, then >>>> mixed up seqids, loosing track for several file openings: >>>> Mar 5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>> file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got >>>> NFS4ERR_BAD_SEQID. Last good seqid was 0 for operation . >>>> Mar 5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt >>>> 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID. Last good >>>> seqid was 1 for operation open. >>>> Mar 5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] >>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for >>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt >>>> 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last good >>>> seqid was 1 for operation open. >>>> ... >>> > This probably also has happened with new clients, but I'm yet not sure >>> > about that. It's happening often enough that it is significantly >>> > disturbing operations here. >>> >>> We now know that this definitely happens with recent clients, too. >> what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain >> mount and edit file and save has no problem… > > No, it's not that simple. We are running about 70 client processes > on different cluster machines hammering on two NFS servers > (here, a very new dual host all flash array from ZStor with 100 Gb/s > mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 > in 11...) that sequence suddenly happens and gives I/O errors for > a couple of files. This started end of last year when we inaugurated > these servers, before we ran oi151a7 to a9 and Hipster 2016 servers > and had no such problems in years. > Messages are only seen on clients, no indications for the errors > on the servers. Snooping would be a hard thing to do, as we don't > know on which client they will occur and at what time ... > > Important server parameters (these are v4 tcp connections on port > 2049, doubled TCP_MAX_BUF, to no help): > nfs-props/servers integer 1024 > nfs-props/server_delegation astring off > nfs-props/listen_backlog integer 512 > nfs-props/mountd_listen_backlog integer 64 > nfs-props/mountd_max_threads integer 16 > nfs-props/server_versmax astring 4 > > set ncsize=0x100000 > set nfs:nfs3_nra=16 > set nfs:nfs3_bsize=0x100000 > set nfs:nfs3_max_transfer_size=0x100000 > set nfs:nfs4_max_transfer_size=0x100000 > set nfs:nfs4_bsize=0x100000 > set rpcmod:clnt_max_conns=8 > * not reachable via system, see default/inetinit and method net-init > *set ndd:tcp_recv_hiwat=0x100000 > *set ndd:tcp_xmit_hiwat=0x100000 > *set ndd:tcp_max_buf=0x400000 > *set ndd:tcp_cwnd_max=0x200000 > *set ndd:tcp_conn_req_max_q=1024 > *set ndd:tcp_conn_req_max_q0=4096 > set nfs:nfs4_bsize=0x100000 > TCP_RECV_HIWAT=1048576 > TCP_XMIT_HIWAT=1048576 > TCP_MAX_BUF=8388608 > TCP_STRONG_ISS=2 > TCP_CWND_MAX=4194304 > TCP_CONN_REQ_MAX_Q=2048 > TCP_CONN_REQ_MAX_Q0=4096 > > Client mount parameters: > /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24 > Flags: > vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600 > Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 > > -- > Dr.Udo Grabowski Inst.of Meteorology & Climate Research IMKASF-SAT > https://www.imk-asf.kit.edu/english/sat.php > KIT - Karlsruhe Institute of Technology https://www.kit.edu > <https://www.kit.edu/> > Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026 > ------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-Me99c8ddf1e4e39269f4f516e Delivery options: https://illumos.topicbox.com/groups/discuss/subscription