oi151a9 # uname -a SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc
On 07/03/2025 11:28, Vitaliy Gusev wrote:
Hi Udo!Could you give more information what is kernel codebase for your oi_151a9, probably name -a ?And could you specify more about setup where you didn’t see the problem. Vitaliy Gusev
oi151a9 # uname -a SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc Problem also occurs for recent clients (OI younger end of July 2024) . Server was OI Dec 2024 and now Jan 2025. Have currently only seen this for the flash array, but we don't torture our other machines that hard. Old oi_151a7/a9 servers and OI Hipster 2018-09 NFS servers had no problems with these loads. Haven't had a newer NFS server OS version in between that had to cope which this mass production, unfortunately, so there's no narrower version bracket.
On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain mount and edit file and save has no problem…On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:Hi, for a while now, we see NFS problems quite often we've not seen before, here between an oi_151a9 client and a recent 2024:12:12 illumos-b7fe974ee3 server; starts with a file <null string>, then mixed up seqids, loosing track for several file openings:Mar 5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last good seqid was 0 for operation . Mar 5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID. Last good seqid was 1 for operation open. Mar 5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID. Last good seqid was 1 for operation open....> This probably also has happened with new clients, but I'm yet not sure > about that. It's happening often enough that it is significantly > disturbing operations here. We now know that this definitely happens with recent clients, too.No, it's not that simple. We are running about 70 client processes on different cluster machines hammering on two NFS servers (here, a very new dual host all flash array from ZStor with 100 Gb/smellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 in 11...) that sequence suddenly happens and gives I/O errors fora couple of files. This started end of last year when we inaugurated these servers, before we ran oi151a7 to a9 and Hipster 2016 servers and had no such problems in years. Messages are only seen on clients, no indications for the errors on the servers. Snooping would be a hard thing to do, as we don't know on which client they will occur and at what time ... Important server parameters (these are v4 tcp connections on port 2049, doubled TCP_MAX_BUF, to no help): nfs-props/servers integer 1024 nfs-props/server_delegation astring off nfs-props/listen_backlog integer 512 nfs-props/mountd_listen_backlog integer 64 nfs-props/mountd_max_threads integer 16 nfs-props/server_versmax astring 4 set ncsize=0x100000 set nfs:nfs3_nra=16 set nfs:nfs3_bsize=0x100000 set nfs:nfs3_max_transfer_size=0x100000 set nfs:nfs4_max_transfer_size=0x100000 set nfs:nfs4_bsize=0x100000 set rpcmod:clnt_max_conns=8 * not reachable via system, see default/inetinit and method net-init *set ndd:tcp_recv_hiwat=0x100000 *set ndd:tcp_xmit_hiwat=0x100000 *set ndd:tcp_max_buf=0x400000 *set ndd:tcp_cwnd_max=0x200000 *set ndd:tcp_conn_req_max_q=1024 *set ndd:tcp_conn_req_max_q0=4096 set nfs:nfs4_bsize=0x100000 TCP_RECV_HIWAT=1048576 TCP_XMIT_HIWAT=1048576 TCP_MAX_BUF=8388608 TCP_STRONG_ISS=2 TCP_CWND_MAX=4194304 TCP_CONN_REQ_MAX_Q=2048 TCP_CONN_REQ_MAX_Q0=4096 Client mount parameters: /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M00a39ef37d2b2c803de8cab5 Delivery options: https://illumos.topicbox.com/groups/discuss/subscription