On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote:
> We've reduced the application code to a simple test case. The following
> code run on each node will soon block, and doesn't receive signals until
> the peer node is shutdown:
>
> ...
> fl.l_whence=SEEK_SET;
> fl.l_start=0;
> fl.l_len=1;
>
> while (1)
> {
> fl.l_type=F_WRLCK;
> retval=fcntl(filedes,F_SETLKW,&fl);
> if (retval==-1)
> {
> perror("lock");
> exit(1);
> }
> // attempt to unlock the index file
> fl.l_type=F_UNLCK;
> retval=fcntl(filedes,F_SETLKW,&fl);
> if (retval==-1)
> {
> perror("unlock");
> exit(1);
> }
> }
Yes, this stresses a problematic design limitation in the RHEL4 dlm where
the dlm master node is ping-ponging all over the place and becomes so
unstable that everything comes to a halt. One possible work-around is to
modify the program to hold a lock on filedes to keep the master stable,
e.g. hold a zero length lock at some unused offset like 0xFFFFFF.
Dave
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster