Seems like this is causing problems with the cluster - getting this on
1 node just before cluster hangs.
gfs_controld[]: retrieve_plocks: ckpt open error 12 gfs

The only reference i can find when googling this to plock.c
        rv = saCkptCheckpointOpen(ckpt_handle, &name, NULL,
                                  SA_CKPT_CHECKPOINT_READ, 0, &h);
        if (rv == SA_AIS_ERR_TRY_AGAIN) {
                log_group(mg, "retrieve_plocks: ckpt open retry");
                sleep(1);
                goto open_retry;
        }
        if (rv != SA_AIS_OK) {
                log_error("retrieve_plocks: ckpt open error %d %s",
                          rv, mg->name);
                return;
        }

Not quite sure what CkptCheckpoint is, but from seeing the code from
ais, it seems to be some form of fault tolerance.
Found a post about a possible bug in the sackptCheckpointOpen function:
https://lists.linux-foundation.org/pipermail/openais/2006-September/008360.html


Have just installed newer versions of cman, gfs-utils, openais and
kmod-gfs, and upgraded kernel now, going to see if im still getting
hangs. been running for a few hours now with node resets and IO bursts
and seems to be behaving a little better.

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to