We have a T5220 configured with 4 cores assigned to the global zone to host three share mounts for 40+ workstations. Regular file sharing has not been an issue however one of the mounts is heavily used for an application. This application is written to be similar to a chat system/message board running off a flat file in its simplest terms. The application uses Qt library fcntl wrappers to find if a lock is on the file before it reads the file to a local buffer file or write the file back if it had the lock.
When we have 3 workstations (with different accounts) accessing the file and writing into it has no problems. When over 5 workstations are accessing the same file we get the following error messages: NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFSMAPID_DOMAIN does not match the server: dcmil domain Please check configuration NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_OPEN got error NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Starting recovery for mount /dcmil/shared (mi 0x60015733000 mi_recovflags [0x1]) on server dcmil, rnode_pt1 ./DMAX_TEST (0x60018fa44a0), rnode_pt2 <null string> (0x0) NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Recovery done for mount /dcmil/shared (mi 0x60015733000) on server dcmil, rnode_pt1 ./DMAX_TEST (0x60018fa44a0), rnode_pt2 <null string> (0x0) NOTICE: [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_LOCK got error NFS4ERR_STALE_STATEID causing recovery action NR_CLIENTID. Client also suspects that the server rebooted, or experienced a network partition. Once the error appears none of the workstations can access the shared file without closing entirely out of the application and waiting for quite sometime. However if they simply cat the file(initiate a new request?) they are fine access is restored, just the application never recovers until it is restarted. We are uncertain if our system?s NFS server or workstation configuration needs tweaked to allow for this application to work. (Change lease timeouts, or change mount options? Just too many variables that I am unfamiliar with to start poking around with and find a definitive solution) We have the default of 20 LOCKD_SERVERS and 16 NFSD_SERVERS in the /etc/default/nfs. Thanks, GigaGeek -- This message posted from opensolaris.org