On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:
On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:Hi, for a while now, we see NFS problems quite often we've not seen before, here between an oi_151a9 client and a recent 2024:12:12 illumos-b7fe974ee3 server; starts with a file <null string>, then mixed up seqids, loosing track for several file openings:Mar 5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last good seqid was 0 for operation . Mar 5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID. Last good seqid was 1 for operation open. Mar 5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last good seqid was 1 for operation open....> This probably also has happened with new clients, but I'm yet not sure > about that. It's happening often enough that it is significantly > disturbing operations here. We now know that this definitely happens with recent clients, too.what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain mount and edit file and save has no problem…
No, it's not that simple. We are running about 70 client processes on different cluster machines hammering on two NFS servers (here, a very new dual host all flash array from ZStor with 100 Gb/smellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 in 11...) that sequence suddenly happens and gives I/O errors for
a couple of files. This started end of last year when we inaugurated these servers, before we ran oi151a7 to a9 and Hipster 2016 servers and had no such problems in years. Messages are only seen on clients, no indications for the errors on the servers. Snooping would be a hard thing to do, as we don't know on which client they will occur and at what time ... Important server parameters (these are v4 tcp connections on port 2049, doubled TCP_MAX_BUF, to no help): nfs-props/servers integer 1024 nfs-props/server_delegation astring off nfs-props/listen_backlog integer 512 nfs-props/mountd_listen_backlog integer 64 nfs-props/mountd_max_threads integer 16 nfs-props/server_versmax astring 4 set ncsize=0x100000 set nfs:nfs3_nra=16 set nfs:nfs3_bsize=0x100000 set nfs:nfs3_max_transfer_size=0x100000 set nfs:nfs4_max_transfer_size=0x100000 set nfs:nfs4_bsize=0x100000 set rpcmod:clnt_max_conns=8 * not reachable via system, see default/inetinit and method net-init *set ndd:tcp_recv_hiwat=0x100000 *set ndd:tcp_xmit_hiwat=0x100000 *set ndd:tcp_max_buf=0x400000 *set ndd:tcp_cwnd_max=0x200000 *set ndd:tcp_conn_req_max_q=1024 *set ndd:tcp_conn_req_max_q0=4096 set nfs:nfs4_bsize=0x100000 TCP_RECV_HIWAT=1048576 TCP_XMIT_HIWAT=1048576 TCP_MAX_BUF=8388608 TCP_STRONG_ISS=2 TCP_CWND_MAX=4194304 TCP_CONN_REQ_MAX_Q=2048 TCP_CONN_REQ_MAX_Q0=4096 Client mount parameters: /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 -- Dr.Udo Grabowski Inst.of Meteorology & Climate Research IMKASF-SAT https://www.imk-asf.kit.edu/english/sat.php KIT - Karlsruhe Institute of Technology https://www.kit.edu Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M4b8b66998c969d5b78001468 Delivery options: https://illumos.topicbox.com/groups/discuss/subscription