We have a T5220 configured with 4 cores assigned to the global zone to host 
three share mounts for 40+ workstations. Regular file sharing has not been an 
issue however one of the mounts is heavily used for an application. This 
application is written to be similar to a chat system/message board running off 
a flat file in its simplest terms. The application uses Qt library fcntl 
wrappers to find if a lock is on the file before it reads the file to a local 
buffer file or write the file back if it had the lock. 

When we have 3 workstations (with different accounts) accessing the file and 
writing into it has no problems. When over 5 workstations are accessing the 
same file we get the following error messages:

NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFSMAPID_DOMAIN does not 
match the server: dcmil domain
Please check configuration

NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_OPEN got error 
NFS4ERR_EXPIRED causing recovery action NR_CLIENTID.  

NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Starting recovery for 
mount /dcmil/shared (mi 0x60015733000 mi_recovflags [0x1]) on server dcmil, 
rnode_pt1 ./DMAX_TEST (0x60018fa44a0), rnode_pt2 <null string> (0x0)

NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Recovery done for 
mount /dcmil/shared (mi 0x60015733000) on server dcmil, rnode_pt1 ./DMAX_TEST 
(0x60018fa44a0), rnode_pt2 <null string> (0x0)

NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_LOCK got error 
NFS4ERR_STALE_STATEID causing recovery action NR_CLIENTID.  Client also 
suspects that the server rebooted, or experienced a network partition. 

Once the error appears none of the workstations can access the shared file 
without closing entirely out of the application and waiting for quite sometime. 
However if they simply cat the file(initiate a new request?) they are fine 
access is restored, just the application never recovers until it is restarted.  
We are uncertain if our system?s NFS server or workstation configuration needs 
tweaked to allow for this application to work.  (Change lease timeouts, or 
change mount options? Just too many variables that I am unfamiliar with to 
start poking around with and find a definitive solution)

We have the default of 20 LOCKD_SERVERS and 16 NFSD_SERVERS in the 
/etc/default/nfs.

Thanks,
    GigaGeek
-- 
This message posted from opensolaris.org

Reply via email to