Hi Josh,

what version of Solaris it is? Can you reproduce it and send a snoop 
taken during the problematic behavior?

--Pavel

On 05/06/09 04:53, Josh Beavers wrote:
> We have a T5220 configured with 4 cores assigned to the global zone to host 
> three share mounts for 40+ workstations. Regular file sharing has not been an 
> issue however one of the mounts is heavily used for an application. This 
> application is written to be similar to a chat system/message board running 
> off a flat file in its simplest terms. The application uses Qt library fcntl 
> wrappers to find if a lock is on the file before it reads the file to a local 
> buffer file or write the file back if it had the lock. 
>
> When we have 3 workstations (with different accounts) accessing the file and 
> writing into it has no problems. When over 5 workstations are accessing the 
> same file we get the following error messages:
>
> NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFSMAPID_DOMAIN does not 
> match the server: dcmil domain
> Please check configuration
>
> NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_OPEN got error 
> NFS4ERR_EXPIRED causing recovery action NR_CLIENTID.  
>
> NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Starting recovery 
> for mount /dcmil/shared (mi 0x60015733000 mi_recovflags [0x1]) on server 
> dcmil, rnode_pt1 ./DMAX_TEST (0x60018fa44a0), rnode_pt2 <null string> (0x0)
>
> NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS Recovery done for 
> mount /dcmil/shared (mi 0x60015733000) on server dcmil, rnode_pt1 ./DMAX_TEST 
> (0x60018fa44a0), rnode_pt2 <null string> (0x0)
>
> NOTICE:  [NFS4][Server: dcmil][Mntpt: /dcmil/shared] NFS op OP_LOCK got error 
> NFS4ERR_STALE_STATEID causing recovery action NR_CLIENTID.  Client also 
> suspects that the server rebooted, or experienced a network partition. 
>
> Once the error appears none of the workstations can access the shared file 
> without closing entirely out of the application and waiting for quite 
> sometime. However if they simply cat the file(initiate a new request?) they 
> are fine access is restored, just the application never recovers until it is 
> restarted.  We are uncertain if our system?s NFS server or workstation 
> configuration needs tweaked to allow for this application to work.  (Change 
> lease timeouts, or change mount options? Just too many variables that I am 
> unfamiliar with to start poking around with and find a definitive solution)
>
> We have the default of 20 LOCKD_SERVERS and 16 NFSD_SERVERS in the 
> /etc/default/nfs.
>
> Thanks,
>     GigaGeek
>   


Reply via email to