Hello,

I believe the NFS server is currently "a4" so it is in our interest to
upgrade to "a7" and test again. I will roll out a new server and test it
immediately.

Again, thank you.

Lund

Marcel Telka wrote:
> Hi Jorgen,
> 
> This was filed for illumos at https://www.illumos.org/issues/417 and fixed in
> summer 2012. The fix implemented here is different when compared to what I did
> in Solaris. I didn't looked at the illumos fix thoroughly, but from the quick
> scan I think it should work.
> 
> The fix should be a part of the OI 151a7. Are you able to reproduce your issue
> with 151a7? If so, please file new bug.
> 
> Thanks.
> 
> On Wed, Jan 30, 2013 at 04:05:06PM +0900, Jorgen Lundman wrote:
>>
>> Hello,
>>
>> We use ZFS and NFS storage fairly heavily, and back in 'the Sun days' we
>> used to have a trouble with OpenOwner locks leaking. Requiring periodic
>> reboots of the NFS servers.
>>
>> At the time, thanks to the Sun engineers, in particular Marcel Telka, the
>> problem was eventually tracked down to;
>>
>> ~~~Quote~~~
>> It looks like our NFSv4 server does not follow this (from RFC 3530):
>>
>>    A given client might generate many open_owner4 data structures for a
>>    given clientid.  The client will periodically either dispose of its
>>    open_owner4s or stop using them for indefinite periods of time.  The
>>    latter situation is why the NFS version 4 protocol does not have an
>>    explicit operation to exit an open_owner4: such an operation is of no
>>    use in that situation.  Instead, to avoid unbounded memory use, the
>>    server needs to implement a strategy for disposing of open_owner4s
>>    that have no current lock, open, or delegation state for any files
>>    and have not been used recently.  The time period used to determine
>>    when to dispose of open_owner4s is an implementation choice.  The
>>    time period should certainly be no less than the lease time plus any
>>    grace period the server wishes to implement beyond a lease time.  The
>>    OPEN_CONFIRM operation allows the server to safely dispose of unused
>>    open_owner4 data structures.
>>
>> Apparently, unused OpenOwner entries are not disposed after some period of 
>> time
>> in case the client is active somehow. They are disposed only for inactive
>> clients. It is visible in rfs4_openowner_expiry(). This is similar to CR
>> 6906432 but it is a completely different scenario. I believe this is a bug, 
>> not
>> yet covered by any filed CR, nor fixed.
>>
>>
>> FYI, I filed this CR:
>>
>> 6976554 Stale OpenOwner entries are not reaped for active clients
>>
>> ~~~Quote~~~
>>
>> Looking to the future, we are exploring changing our NFS Storage OS, and
>> have tried IllumOS (OpenIndiana)
>>
>> Alas, we appear to get this trouble yet again. I suppose the issue was
>> never fixed in OpenSolaris/IllumOs. What are the chances of this happening?
>>
>> # uname -a
>> SunOS nfs02.dw 5.11 oi_151a4 i86pc i386 i86pc Solaris
>>
>> echo '::rfs4_db' | mdb -k
>> rfs4_database=ffffffa6167afb50
>>   debug_flags=00000000   shutdown:      count=0 tables=ffffff2646d4fd60
>> ------------------ Table ------------------- Bkt  ------- Indices -------
>> Address          Name          Flags    Cnt  Cnt  Pointer          Cnt  Max
>> ffffff2646d4fd60 DelegStateID  00000000 12057 2047 fffffff7e25f31c0 0002 0002
>> fffffffeb5ea4630 File          00000000 19922 2047 ffffffefcfb7a140 0001 0001
>> ffffffefbc0b23f0 Lockowner     00000000 2035 2047 ffffffd58124cc00 0002 0002
>> ffffffefbcad8088 LockStateID   00000000 1743 2047 fffffffe2347ac40 0002 0002
>> ffffffffb8e56c88 OpenStateID   00000000 9270 2047 ffffffd5827202c0 0003 0003
>> ffffffefbb3641b8 OpenOwner     00000000 705410 2047 ffffffff7d187c40 0001 
>> 0001
>> ffffffefc2dd0358 ClntIP        00000000 0000 2047 ffffffd581d2df40 0001 0001
>> ffffffefbd2370c0 Client        00000000 0007 2047 fffffffedb66ba00 0002 0002
>>
>> In particular, the OpenOwner.
>>
>> [email protected]:~# echo '::rfs4_db' | mdb -k | grep OpenOwner
>>
>> ffffffefbb3641b8 OpenOwner     00000000 705957 2047 ffffffff7d187c40 0001 
>> 0001
>> root@nfs-client# ./locktest
>>
>> [email protected]:~# echo '::rfs4_db' | mdb -k | grep OpenOwner
>> ffffffefbb3641b8 OpenOwner     00000000 706022 2047 ffffffff7d187c40 0001 
>> 0001
>>
>>
>> locktest perl program can be found here;
>>  http://mail.opensolaris.org/pipermail/nfs-discuss/2010-October/002154.html
>>
>>
>> Jorgen Lundman
>>
>> -- 
>> Jorgen Lundman       | <[email protected]>
>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>> Japan                | +81 (0)3 -3375-1767          (home)
>>
>>
>> -------------------------------------------
>> illumos-discuss
>> Archives: https://www.listbox.com/member/archive/182180/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
> 

-- 
Jorgen Lundman       | <[email protected]>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to