On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:


On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:

On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:
Hi,
for a while now, we see NFS problems quite often we've not seen
before, here between an oi_151a9 client and a recent 2024:12:12
illumos-b7fe974ee3 server; starts with a file <null string>, then
mixed up seqids, loosing track for several file openings:
Mar  5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last good seqid was 0 for operation . Mar  5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID.  Last good seqid was 1 for operation open. Mar  5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last good seqid was 1 for operation open.
...

> This probably also has happened with new clients, but I'm yet not sure
> about that. It's happening often enough that it is significantly
> disturbing operations here.

We now know that this definitely happens with recent clients, too.


what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain mount and edit file and save has no problem…


No, it's not that simple. We are running about 70 client processes
on different cluster machines hammering on two NFS servers
(here, a very new dual host all flash array from ZStor with 100 Gb/s
mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 in 11...) that sequence suddenly happens and gives I/O errors for
a couple of files. This started end of last year when we inaugurated
these servers, before we ran oi151a7 to a9 and Hipster 2016 servers
and had no such problems in years.
Messages are only seen on clients, no indications for the errors
on the servers. Snooping would be a hard thing to do, as we don't
know on which client they will occur and at what time ...

Important server parameters (these are v4 tcp connections on port
2049, doubled TCP_MAX_BUF, to no help):
nfs-props/servers integer 1024
nfs-props/server_delegation astring off
nfs-props/listen_backlog integer 512
nfs-props/mountd_listen_backlog integer 64
nfs-props/mountd_max_threads integer 16
nfs-props/server_versmax astring 4

set ncsize=0x100000
set nfs:nfs3_nra=16
set nfs:nfs3_bsize=0x100000
set nfs:nfs3_max_transfer_size=0x100000
set nfs:nfs4_max_transfer_size=0x100000
set nfs:nfs4_bsize=0x100000
set rpcmod:clnt_max_conns=8
* not reachable via system, see default/inetinit and method net-init
*set ndd:tcp_recv_hiwat=0x100000
*set ndd:tcp_xmit_hiwat=0x100000
*set ndd:tcp_max_buf=0x400000
*set ndd:tcp_cwnd_max=0x200000
*set ndd:tcp_conn_req_max_q=1024
*set ndd:tcp_conn_req_max_q0=4096
set nfs:nfs4_bsize=0x100000
TCP_RECV_HIWAT=1048576
TCP_XMIT_HIWAT=1048576
TCP_MAX_BUF=8388608
TCP_STRONG_ISS=2
TCP_CWND_MAX=4194304
TCP_CONN_REQ_MAX_Q=2048
TCP_CONN_REQ_MAX_Q0=4096

Client mount parameters:
/home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24
Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60


--
Dr.Udo Grabowski  Inst.of Meteorology & Climate Research IMKASF-SAT
https://www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology          https://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M4b8b66998c969d5b78001468
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to