Hi Josh, what version of Solaris it is? Can you reproduce it and send a snoop taken during the problematic behavior?
--Pavel On 05/06/09 04:53, Josh Beavers wrote: > We have a T5220 configured with 4 cores assigned to the global zone to host > three share mounts for 40+ workstations. Regular file sharing has not been an > issue however one of the mounts is heavily used for an application. This > application is written to be similar to a chat system/message board running > off a flat file in its simplest terms. The application uses Qt library fcntl > wrappers to find if a lock is on the file before it reads the file to a local > buffer file or write the file back if it had the lock. > > When we have 3 workstations (with different accounts) accessing the file and > writing into it has no problems. When over 5 workstations are accessing the > same file we get the following error messages: > > NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFSMAPID_DOMAIN does not > match the server: dcmil domain > Please check configuration > > NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_OPEN got error > NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. > > NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Starting recovery > for mount /dcmil/shared (mi 0x60015733000 mi_recovflags [0x1]) on server > dcmil, rnode_pt1 ./DMAX_TEST (0x60018fa44a0), rnode_pt2 <null string> (0x0) > > NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Recovery done for > mount /dcmil/shared (mi 0x60015733000) on server dcmil, rnode_pt1 ./DMAX_TEST > (0x60018fa44a0), rnode_pt2 <null string> (0x0) > > NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_LOCK got error > NFS4ERR_STALE_STATEID causing recovery action NR_CLIENTID. Client also > suspects that the server rebooted, or experienced a network partition. > > Once the error appears none of the workstations can access the shared file > without closing entirely out of the application and waiting for quite > sometime. However if they simply cat the file(initiate a new request?) they > are fine access is restored, just the application never recovers until it is > restarted. We are uncertain if our system?s NFS server or workstation > configuration needs tweaked to allow for this application to work. (Change > lease timeouts, or change mount options? Just too many variables that I am > unfamiliar with to start poking around with and find a definitive solution) > > We have the default of 20 LOCKD_SERVERS and 16 NFSD_SERVERS in the > /etc/default/nfs. > > Thanks, > GigaGeek >