Re: [discuss] NFS Problems file null string, bad seqid

Vitaliy Gusev Fri, 07 Mar 2025 03:56:37 -0800

To understand behaviour and reproduce case. Do you have the same issue if use 
Linux NFS v4.0 clients?


—
Vitaliy Gusev

> On 7 Mar 2025, at 14:05, Udo Grabowski (IMK) <udo.grabow...@kit.edu> wrote:
> 
> oi151a9 # uname -a
> SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc
> 
> On 07/03/2025 11:28, Vitaliy Gusev wrote:
>> Hi Udo!
>> Could you give more information what is kernel codebase for your oi_151a9, 
>> probably name -a ?
>> And could you specify more about setup where you didn’t see the problem.
>> Vitaliy Gusev
> 
> oi151a9 # uname -a
> SunOS imksuns11 5.11 oi_151a9 i86pc i386 i86pc
> 
> Problem also occurs for recent clients (OI younger end of July 2024) .
> Server was OI Dec 2024 and now Jan 2025. Have currently only seen this
> for the flash array, but we don't torture our other machines that hard.
> 
> Old oi_151a7/a9 servers and OI Hipster 2018-09 NFS servers had no
> problems with these loads. Haven't had a newer NFS server OS version
> in between that had to cope which this mass production, unfortunately,
> so there's no narrower version bracket.
> 
> 
>>> On 6 Mar 2025, at 16:17, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> 
>>> wrote:
>>> 
>>> 
>>> 
>>> On 06-03-2025 13:57, Toomas Soome via illumos-discuss wrote:
>>>>> On 6. Mar 2025, at 13:41, Udo Grabowski (IMKASF) <udo.grabow...@kit.edu> 
>>>>> wrote:
>>>>> 
>>>>> On 05-03-2025 12:47, Udo Grabowski (IMKASF) wrote:
>>>>>> Hi,
>>>>>> for a while now, we see NFS problems quite often we've not seen
>>>>>> before, here between an oi_151a9 client and a recent 2024:12:12
>>>>>> illumos-b7fe974ee3 server; starts with a file <null string>, then
>>>>>> mixed up seqids, loosing track for several file openings:
>>>>>> Mar  5 10:08:34 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>>>> file <null string> (rnode_pt 0x0), pid 0 using seqid 1 got 
>>>>>> NFS4ERR_BAD_SEQID.  Last good seqid was 0 for operation .
>>>>>> Mar  5 10:08:38 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 
>>>>>> 0xfffffeed90af63f8), pid 0 using seqid 2 got NFS4ERR_BAD_SEQID.  Last 
>>>>>> good seqid was 1 for operation open.
>>>>>> Mar  5 10:08:40 imksuns11 nfs: [ID 435015 kern.info] NOTICE: [NFS4] 
>>>>>> [Server: imksunth11][Mntpt: /home/Processor/Work_22]Operation open for 
>>>>>> file ./CEDRO_2_ES/SF6:X_262.0/bout/create_coarse_inp_30300.log (rnode_pt 
>>>>>> 0xfffffeed90af63f8), pid 0 using seqid 1 got NFS4ERR_BAD_SEQID.  Last 
>>>>>> good seqid was 1 for operation open.
>>>>>> ...
>>>>> > This probably also has happened with new clients, but I'm yet not sure
>>>>> > about that. It's happening often enough that it is significantly
>>>>> > disturbing operations here.
>>>>> 
>>>>> We now know that this definitely happens with recent clients, too.
>>>> what kind of operations are done there? maybe 'snoop rpc nfs’ ? Just plain 
>>>> mount and edit file and save has no problem…
>>> 
>>> No, it's not that simple. We are running about 70 client processes
>>> on different cluster machines hammering on two NFS servers
>>> (here, a very new dual host all flash array from ZStor with 100 Gb/s
>>> mellanox ConnectX-6 mlxcx nics), and once in a while (between 1 in 5000 and 
>>> 3 in 11...)  that sequence suddenly happens and gives I/O errors for
>>> a couple of files. This started end of last year when we inaugurated
>>> these servers, before we ran oi151a7 to a9 and Hipster 2016 servers
>>> and had no such problems in years.
>>> Messages are only seen on clients, no indications for the errors
>>> on the servers. Snooping would be a hard thing to do, as we don't
>>> know on which client they will occur and at what time ...
>>> 
>>> Important server parameters (these are v4 tcp connections on port
>>> 2049, doubled TCP_MAX_BUF, to no help):
>>> nfs-props/servers integer 1024
>>> nfs-props/server_delegation astring off
>>> nfs-props/listen_backlog integer 512
>>> nfs-props/mountd_listen_backlog integer 64
>>> nfs-props/mountd_max_threads integer 16
>>> nfs-props/server_versmax astring 4
>>> 
>>> set ncsize=0x100000
>>> set nfs:nfs3_nra=16
>>> set nfs:nfs3_bsize=0x100000
>>> set nfs:nfs3_max_transfer_size=0x100000
>>> set nfs:nfs4_max_transfer_size=0x100000
>>> set nfs:nfs4_bsize=0x100000
>>> set rpcmod:clnt_max_conns=8
>>> * not reachable via system, see default/inetinit and method net-init
>>> *set ndd:tcp_recv_hiwat=0x100000
>>> *set ndd:tcp_xmit_hiwat=0x100000
>>> *set ndd:tcp_max_buf=0x400000
>>> *set ndd:tcp_cwnd_max=0x200000
>>> *set ndd:tcp_conn_req_max_q=1024
>>> *set ndd:tcp_conn_req_max_q0=4096
>>> set nfs:nfs4_bsize=0x100000
>>> TCP_RECV_HIWAT=1048576
>>> TCP_XMIT_HIWAT=1048576
>>> TCP_MAX_BUF=8388608
>>> TCP_STRONG_ISS=2
>>> TCP_CWND_MAX=4194304
>>> TCP_CONN_REQ_MAX_Q=2048
>>> TCP_CONN_REQ_MAX_Q0=4096
>>> 
>>> Client mount parameters:
>>> /home/Processor/Work_24 from imksunth11:/Work_Pool_11/Work_24
>>> Flags: 
>>> vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
>>> Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
>>> 
>>> 

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T939fcf899d5526b0-M850e464ee19602ef5ee9f03f
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Re: [discuss] NFS Problems file null string, bad seqid

Reply via email to