[Linux-HA] 3 node Clustering issue while adding the different LUN for the same /vz directory

jaspal singla Fri, 18 Dec 2009 10:28:59 -0800

Hello All,

I am getting a strange issue while configuring 3 node clustering..


The issue is I have successfully deployed 2 node Active-passive cluster for
the Virtozzo with shared iSCSI.

Problem:
Now I have a requirement for 3 node clusetring with 2 Active and 1 passive
node. But when I try to add resource Filesystem with /vz partition and has
also fixed the location for this newer resource filesystem to 3rd node but
unfortunately this resource(Filesystem for new LUN) get shifts to the active
node where already some other LUN partition mount on /vz  instead of 3rd
newer node and my whole cluster become unmanageable since that active node
already have some other LUN bind on /vz.

Details:
iSCSI Server: I am using common storage iSCSI server where I have created 2
Lun's for my 2 active nodes.

Resources created:
Locations: I have created 2 locations: one for 1 active server and 1 for
newer active server.
Groups: I have created 2 groups for each active nodes. And in each group I
have added 3 resources (Virtual IP, Filesystem, vz-cluster script)

Everything was working fine, even I am able to add my 2nd virtual ip on to
node 3 but whenever I try to add filesystem resource to /vz on to newer node
then this filesystem get shifts to another active running node and my whole
cluster become unmanageable.

For testing I have mount this partition with some different folder say /mnt
and everything was fine and this time this resource remains in my 3rd node.

I think the problem is with /vz directory. But it should not like that.

Please help me out to figure out the issue whether the above scenerio is
possible to bind different LUN's on same /vz directory but on different
different machines.

The details for the my scenerio are as:

1) I am running CentOS-5.3
2) Kernel version is 2.6.18-028stab059.6 (Virtoozzo Kernel)
3) Heartbeat version- 2.1.3-3.
4) Installation through RPM.
5) Using Heartbeat version 2.
6) cat /etc/ha.d/ha.cf
___________________________________________________________________________________________

deadtime  10
bcast eth1
crm yes
node node_master
node node_slave
node node3
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0

_____________________________________________________________________________________________

This file is same in all the 3 nodes.

7) tail -f /var/log/ha-log

________________________________________________________________________________________________________

crmd[8325]: 2009/12/18_13:39:55 info: do_lrm_rsc_op: Performing
op=resource_mount_node3_monitor_0
key=6:166:9acb7de2-c2b3-42ab-9ee0-89a3c3ad1b88)
lrmd[8322]: 2009/12/18_13:39:55 info: rsc:resource_mount_node3: monitor
Filesystem[11186]:      2009/12/18_13:39:55 WARNING: Couldn't find device
[/dev/disk/by-uuid/81c3845e-c2f6-4cb0-a0cd-e00c074942fb]. Expected /dev/???
to exist
cib[11213]: 2009/12/18_13:39:55 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
cib[11213]: 2009/12/18_13:39:55 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
cib[11213]: 2009/12/18_13:39:55 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
/var/lib/heartbeat/crm/cib.xml.sig.last)
cib[11213]: 2009/12/18_13:39:55 info: write_cib_contents: Wrote version
0.344.3 of the CIB to disk (digest: 221e874690c5176734064408319181b0)
crmd[8325]: 2009/12/18_13:39:55 info: process_lrm_event: LRM operation
resource_mount_node3_monitor_0 (call=73, rc=0) complete
cib[11213]: 2009/12/18_13:39:55 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
cib[11213]: 2009/12/18_13:39:55 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
/var/lib/heartbeat/crm/cib.xml.sig.last)
crmd[8325]: 2009/12/18_13:39:57 info: do_lrm_rsc_op: Performing
op=resource_mount_node3_stop_0
key=21:167:9acb7de2-c2b3-42ab-9ee0-89a3c3ad1b88)
lrmd[8322]: 2009/12/18_13:39:57 info: rsc:resource_mount_node3: stop
Filesystem[11225]:      2009/12/18_13:39:57 WARNING: Couldn't find device
[/dev/disk/by-uuid/81c3845e-c2f6-4cb0-a0cd-e00c074942fb]. Expected /dev/???
to exist
Filesystem[11225]:      2009/12/18_13:39:57 INFO: Running stop for
/dev/disk/by-uuid/81c3845e-c2f6-4cb0-a0cd-e00c074942fb on /vz
Filesystem[11225]:      2009/12/18_13:39:57 INFO: Trying to unmount /vz
lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy
umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:39:57 ERROR: Couldn't unmount /vz;
trying cleanup with SIGTERM
lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) /vz:
lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stdout)   6778  8857  8858  8862  8869  8880
8883  8884  8898  9002
lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) mmmmmmmmmm

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8857: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8858: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8862: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8869: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8880: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8883: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 8884: No such
process

lrmd[8322]: 2009/12/18_13:39:57 info: RA output:
(resource_mount_node3:stop:stderr) Could not kill process 9002: No such
process

Filesystem[11225]:      2009/12/18_13:39:57 INFO: Some processes on /vz were
signalled
lrmd[8322]: 2009/12/18_13:39:58 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

lrmd[8322]: 2009/12/18_13:39:58 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:39:58 ERROR: Couldn't unmount /vz;
trying cleanup with SIGTERM
Filesystem[11225]:      2009/12/18_13:39:58 INFO: No processes on /vz were
signalled
lrmd[8322]: 2009/12/18_13:39:59 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy
umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:39:59 ERROR: Couldn't unmount /vz;
trying cleanup with SIGTERM
Filesystem[11225]:      2009/12/18_13:39:59 INFO: No processes on /vz were
signalled
lrmd[8322]: 2009/12/18_13:40:00 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

lrmd[8322]: 2009/12/18_13:40:00 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:40:00 ERROR: Couldn't unmount /vz;
trying cleanup with SIGKILL
Filesystem[11225]:      2009/12/18_13:40:00 INFO: No processes on /vz were
signalled
lrmd[8322]: 2009/12/18_13:40:01 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

lrmd[8322]: 2009/12/18_13:40:01 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:40:01 ERROR: Couldn't unmount /vz;
trying cleanup with SIGKILL
Filesystem[11225]:      2009/12/18_13:40:01 INFO: No processes on /vz were
signalled
lrmd[8322]: 2009/12/18_13:40:02 info: RA output:
(resource_mount_node3:stop:stderr) umount: /vz: device is busy
umount: /vz: device is busy

Filesystem[11225]:      2009/12/18_13:40:02 ERROR: Couldn't unmount /vz;
trying cleanup with SIGKILL
Filesystem[11225]:      2009/12/18_13:40:02 INFO: No processes on /vz were
signalled
Filesystem[11225]:      2009/12/18_13:40:03 ERROR: Couldn't unmount /vz,
giving up!

_______________________________________________________________________________________________________

Waiting for the positive response..

Cheer's,
Jaspal
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] 3 node Clustering issue while adding the different LUN for the same /vz directory

Reply via email to