oi151a9 # uname -a
SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc

On 07/03/2025 11:28, Vitaliy Gusev wrote:
Hi Udo!

Could you give more information what is kernel codebase for your oi_151a9, probably name -a ?

And could you specify more about setup where you didn’t see the problem.

Vitaliy Gusev

oi151a9 # uname -a
SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc

Problem also occurs for recent clients (OI younger end of July 2024) .
Server was OI Dec 2024 and now Jan 2025. Have currently only seen this
for the flash array, but we don't torture our other machines that hard.

Old oi_151a7/a9 servers and OI Hipster 2018-09 NFS servers had no
problems with these loads. Haven't had a newer NFS server OS version
in between that had to cope which this mass production, unfortunately,
so there's no narrower version bracket.


On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:



On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:
On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:

On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:
Hi,
for a while now, we see NFS problems quite often we've not seen
before, here between an oi_151a9 client and a recent 2024:12:12
illumos-b7fe974ee3 server; starts with a file <null string>, then
mixed up seqids, loosing track for several file openings:
Mar  5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last good seqid was 0 for operation . Mar  5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID.  Last good seqid was 1 for operation open. Mar  5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last good seqid was 1 for operation open.
...

> This probably also has happened with new clients, but I'm yet not sure
> about that. It's happening often enough that it is significantly
> disturbing operations here.

We now know that this definitely happens with recent clients, too.
what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain mount and edit file and save has no problem…

No, it's not that simple. We are running about 70 client processes
on different cluster machines hammering on two NFS servers
(here, a very new dual host all flash array from ZStor with 100 Gb/s
mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 in 11...)  that sequence suddenly happens and gives I/O errors for
a couple of files. This started end of last year when we inaugurated
these servers, before we ran oi151a7 to a9 and Hipster 2016 servers
and had no such problems in years.
Messages are only seen on clients, no indications for the errors
on the servers. Snooping would be a hard thing to do, as we don't
know on which client they will occur and at what time ...

Important server parameters (these are v4 tcp connections on port
2049, doubled TCP_MAX_BUF, to no help):
nfs-props/servers integer 1024
nfs-props/server_delegation astring off
nfs-props/listen_backlog integer 512
nfs-props/mountd_listen_backlog integer 64
nfs-props/mountd_max_threads integer 16
nfs-props/server_versmax astring 4

set ncsize=0x100000
set nfs:nfs3_nra=16
set nfs:nfs3_bsize=0x100000
set nfs:nfs3_max_transfer_size=0x100000
set nfs:nfs4_max_transfer_size=0x100000
set nfs:nfs4_bsize=0x100000
set rpcmod:clnt_max_conns=8
* not reachable via system, see default/inetinit and method net-init
*set ndd:tcp_recv_hiwat=0x100000
*set ndd:tcp_xmit_hiwat=0x100000
*set ndd:tcp_max_buf=0x400000
*set ndd:tcp_cwnd_max=0x200000
*set ndd:tcp_conn_req_max_q=1024
*set ndd:tcp_conn_req_max_q0=4096
set nfs:nfs4_bsize=0x100000
TCP_RECV_HIWAT=1048576
TCP_XMIT_HIWAT=1048576
TCP_MAX_BUF=8388608
TCP_STRONG_ISS=2
TCP_CWND_MAX=4194304
TCP_CONN_REQ_MAX_Q=2048
TCP_CONN_REQ_MAX_Q0=4096

Client mount parameters:
/home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24
Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M00a39ef37d2b2c803de8cab5
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to