Hello, I believe the NFS server is currently "a4" so it is in our interest to upgrade to "a7" and test again. I will roll out a new server and test it immediately.
Again, thank you. Lund Marcel Telka wrote: > Hi Jorgen, > > This was filed for illumos at https://www.illumos.org/issues/417 and fixed in > summer 2012. The fix implemented here is different when compared to what I did > in Solaris. I didn't looked at the illumos fix thoroughly, but from the quick > scan I think it should work. > > The fix should be a part of the OI 151a7. Are you able to reproduce your issue > with 151a7? If so, please file new bug. > > Thanks. > > On Wed, Jan 30, 2013 at 04:05:06PM +0900, Jorgen Lundman wrote: >> >> Hello, >> >> We use ZFS and NFS storage fairly heavily, and back in 'the Sun days' we >> used to have a trouble with OpenOwner locks leaking. Requiring periodic >> reboots of the NFS servers. >> >> At the time, thanks to the Sun engineers, in particular Marcel Telka, the >> problem was eventually tracked down to; >> >> ~~~Quote~~~ >> It looks like our NFSv4 server does not follow this (from RFC 3530): >> >> A given client might generate many open_owner4 data structures for a >> given clientid. The client will periodically either dispose of its >> open_owner4s or stop using them for indefinite periods of time. The >> latter situation is why the NFS version 4 protocol does not have an >> explicit operation to exit an open_owner4: such an operation is of no >> use in that situation. Instead, to avoid unbounded memory use, the >> server needs to implement a strategy for disposing of open_owner4s >> that have no current lock, open, or delegation state for any files >> and have not been used recently. The time period used to determine >> when to dispose of open_owner4s is an implementation choice. The >> time period should certainly be no less than the lease time plus any >> grace period the server wishes to implement beyond a lease time. The >> OPEN_CONFIRM operation allows the server to safely dispose of unused >> open_owner4 data structures. >> >> Apparently, unused OpenOwner entries are not disposed after some period of >> time >> in case the client is active somehow. They are disposed only for inactive >> clients. It is visible in rfs4_openowner_expiry(). This is similar to CR >> 6906432 but it is a completely different scenario. I believe this is a bug, >> not >> yet covered by any filed CR, nor fixed. >> >> >> FYI, I filed this CR: >> >> 6976554 Stale OpenOwner entries are not reaped for active clients >> >> ~~~Quote~~~ >> >> Looking to the future, we are exploring changing our NFS Storage OS, and >> have tried IllumOS (OpenIndiana) >> >> Alas, we appear to get this trouble yet again. I suppose the issue was >> never fixed in OpenSolaris/IllumOs. What are the chances of this happening? >> >> # uname -a >> SunOS nfs02.dw 5.11 oi_151a4 i86pc i386 i86pc Solaris >> >> echo '::rfs4_db' | mdb -k >> rfs4_database=ffffffa6167afb50 >> debug_flags=00000000 shutdown: count=0 tables=ffffff2646d4fd60 >> ------------------ Table ------------------- Bkt ------- Indices ------- >> Address Name Flags Cnt Cnt Pointer Cnt Max >> ffffff2646d4fd60 DelegStateID 00000000 12057 2047 fffffff7e25f31c0 0002 0002 >> fffffffeb5ea4630 File 00000000 19922 2047 ffffffefcfb7a140 0001 0001 >> ffffffefbc0b23f0 Lockowner 00000000 2035 2047 ffffffd58124cc00 0002 0002 >> ffffffefbcad8088 LockStateID 00000000 1743 2047 fffffffe2347ac40 0002 0002 >> ffffffffb8e56c88 OpenStateID 00000000 9270 2047 ffffffd5827202c0 0003 0003 >> ffffffefbb3641b8 OpenOwner 00000000 705410 2047 ffffffff7d187c40 0001 >> 0001 >> ffffffefc2dd0358 ClntIP 00000000 0000 2047 ffffffd581d2df40 0001 0001 >> ffffffefbd2370c0 Client 00000000 0007 2047 fffffffedb66ba00 0002 0002 >> >> In particular, the OpenOwner. >> >> [email protected]:~# echo '::rfs4_db' | mdb -k | grep OpenOwner >> >> ffffffefbb3641b8 OpenOwner 00000000 705957 2047 ffffffff7d187c40 0001 >> 0001 >> root@nfs-client# ./locktest >> >> [email protected]:~# echo '::rfs4_db' | mdb -k | grep OpenOwner >> ffffffefbb3641b8 OpenOwner 00000000 706022 2047 ffffffff7d187c40 0001 >> 0001 >> >> >> locktest perl program can be found here; >> http://mail.opensolaris.org/pipermail/nfs-discuss/2010-October/002154.html >> >> >> Jorgen Lundman >> >> -- >> Jorgen Lundman | <[email protected]> >> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) >> Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) >> Japan | +81 (0)3 -3375-1767 (home) >> >> >> ------------------------------------------- >> illumos-discuss >> Archives: https://www.listbox.com/member/archive/182180/=now >> RSS Feed: https://www.listbox.com/member/archive/rss/182180/23046997-5a38a7d8 >> Modify Your Subscription: https://www.listbox.com/member/?& >> Powered by Listbox: http://www.listbox.com > -- Jorgen Lundman | <[email protected]> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
