Hi Udo!

Could you give more information what is kernel codebase for your oi_151a9, 
probably name -a ?

And could you specify more about setup where you didn’t see the problem.

—
Vitaliy Gusev

> On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> wrote:
> 
> 
> 
> On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:
>>> On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> 
>>> wrote:
>>> 
>>> On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:
>>>> Hi,
>>>> for a while now, we see NFS problems quite often we've not seen
>>>> before, here between an oi_151a9 client and a recent 2024:12:12
>>>> illumos-b7fe974ee3 server; starts with a file <null string>, then
>>>> mixed up seqids, loosing track for several file openings:
>>>> Mar  5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>> file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got 
>>>> NFS4ERR_BAD_SEQID.  Last good seqid was 0 for operation .
>>>> Mar  5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 
>>>> 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID.  Last good 
>>>> seqid was 1 for operation open.
>>>> Mar  5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 
>>>> 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last good 
>>>> seqid was 1 for operation open.
>>>> ...
>>> > This probably also has happened with new clients, but I'm yet not sure
>>> > about that. It's happening often enough that it is significantly
>>> > disturbing operations here.
>>> 
>>> We now know that this definitely happens with recent clients, too.
>> what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain 
>> mount and edit file and save has no problem…
> 
> No, it's not that simple. We are running about 70 client processes
> on different cluster machines hammering on two NFS servers
> (here, a very new dual host all flash array from ZStor with 100 Gb/s
> mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 3 
> in 11...)  that sequence suddenly happens and gives I/O errors for
> a couple of files. This started end of last year when we inaugurated
> these servers, before we ran oi151a7 to a9 and Hipster 2016 servers
> and had no such problems in years.
> Messages are only seen on clients, no indications for the errors
> on the servers. Snooping would be a hard thing to do, as we don't
> know on which client they will occur and at what time ...
> 
> Important server parameters (these are v4 tcp connections on port
> 2049, doubled TCP_MAX_BUF, to no help):
> nfs-props/servers integer 1024
> nfs-props/server_delegation astring off
> nfs-props/listen_backlog integer 512
> nfs-props/mountd_listen_backlog integer 64
> nfs-props/mountd_max_threads integer 16
> nfs-props/server_versmax astring 4
> 
> set ncsize=0x100000
> set nfs:nfs3_nra=16
> set nfs:nfs3_bsize=0x100000
> set nfs:nfs3_max_transfer_size=0x100000
> set nfs:nfs4_max_transfer_size=0x100000
> set nfs:nfs4_bsize=0x100000
> set rpcmod:clnt_max_conns=8
> * not reachable via system, see default/inetinit and method net-init
> *set ndd:tcp_recv_hiwat=0x100000
> *set ndd:tcp_xmit_hiwat=0x100000
> *set ndd:tcp_max_buf=0x400000
> *set ndd:tcp_cwnd_max=0x200000
> *set ndd:tcp_conn_req_max_q=1024
> *set ndd:tcp_conn_req_max_q0=4096
> set nfs:nfs4_bsize=0x100000
> TCP_RECV_HIWAT=1048576
> TCP_XMIT_HIWAT=1048576
> TCP_MAX_BUF=8388608
> TCP_STRONG_ISS=2
> TCP_CWND_MAX=4194304
> TCP_CONN_REQ_MAX_Q=2048
> TCP_CONN_REQ_MAX_Q0=4096
> 
> Client mount parameters:
> /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24
> Flags: 
> vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
> Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
> 
> --
> Dr.Udo Grabowski  Inst.of Meteorology & Climate Research IMKASF-SAT
> https://www.imk-asf.kit.edu/english/sat.php
> KIT - Karlsruhe Institute of Technology          https://www.kit.edu 
> <https://www.kit.edu/>
> Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026
> 

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-Me99c8ddf1e4e39269f4f516e
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to