[Linux-cluster] F_SETLK fails after recovery

Neale Ferguson Tue, 02 Sep 2014 08:07:17 -0700

Hi,
 In our two node system if one node fails, the other node takes over the 
application and uses the shared gfs2 target successfully. However, after the 
failed node comes back any attempts to lock files on the gfs2 resource results 
in -ENOSYS. The following test program exhibits the problem - in normal 
operation the lock succeeds but in the fail/recover scenario we get -ENOSYS:


#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int 
main(int argc, char **argv)
{
        int fd;
        struct flock fl;

        fd = open("/mnt/test.file",O_RDONLY);
        if (fd != -1) {
                if (fcntl(fd, F_SETFL, O_RDONLY|O_DSYNC) != -1) {
                        fl.l_type = F_RDLCK;
                        fl.l_whence = SEEK_SET;
                        fl.l_start = 0;
                        fl.l_len = 0;
                        if (fcntl(fd, F_SETLK, &fl) != -1)
                                printf("File locked successfully\n");
                        else
                                perror("fcntl(F_SETLK)");
                } else
                        perror("fcntl(F_SETFL)");
                close (fd);
        } else 
                perror("open");
}

I've tracked things down to these messages:

1409631951 lockspace lvclusdidiz0360 plock disabled our sig 816fba01 nodeid 2 
sig 2f6b
:
1409634840 lockspace lvclusdidiz0360 plock disabled our sig 0 nodeid 2 sig 2f6b

Which indicates the lockspace attribute disable_plock has been set by way of 
the other node calling send_plocks_stored
().

Looking at the cpg.c:

static void prepare_plocks(struct lockspace *ls)
{
      
struct change *cg = list_first_entry(&ls->changes, struct change, list);
      
struct member *memb;
uint32_t sig;

:
:
:
      if (nodes_added(ls))
            store_plocks(ls, &sig);
      send_plocks_stored(ls, sig);
}

If nodes_added(ls) returns false then an uninitialized "sig" value will be 
passed to send_plocks_stored(). Do the "our sig" and "sig" values in the above 
log messages make sense?

If this is not the case, what is supposed to happen in order re-enable plocks 
on the recovered node?

Neale


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] F_SETLK fails after recovery

Reply via email to