You have to use /etc/fstab with _netdev option, redhat cluster doesn't support active/active service
2012/11/20 james cyriac <jamescyria...@gmail.com> > Hi all, > > i am installing redhat cluster 6 two node cluser.the issue is i am not > able to mount my GFS file sytem in both the node at same time.. > > please find my clustat output .. > > > [root@saperpprod01 ~]# clustat > Cluster Status for sap-cluster1 @ Tue Nov 20 14:51:28 2012 > Member Status: Quorate > Member Name ID > Status > ------ ---- ---- > ------ > saperpprod01 1 > Online, Local, rgmanager > saperpprod02 2 > Online, rgmanager > Service Name Owner > (Last) State > ------- ---- ----- > ------ ----- > service:oracle > saperpprod01 started > service:profile-gfs > saperpprod01 started > service:sap > saperpprod01 started > [root@saperpprod01 ~]# > oralce and sap is fine and it is flaying in both nodes.i want mount my GFS > vols same time at both the nodes. > > Thanks in advacne > james > > > but profile-gfs is GFS file system and i want present the GFS mount point > same time both the node.please help me this > On Mon, Nov 19, 2012 at 1:16 PM, Jacek Konieczny <jaj...@jajcus.net>wrote: > >> Hi, >> >> I am setting up a cluster using: >> >> Linux kernel 3.6.6 >> Corosync 2.1.0 >> DLM 4.0.0 >> CLVMD 2.02.98 >> Pacemaker 1.1.8 >> DRBD 8.3.13 >> >> Now I have stuck on the 'clean shutdown of a node' scenario. >> >> It goes like that: >> - resources using the shared storage are properly stopped by Pacemaker. >> - DRBD is cleanly demoted and unconfigured by Pacemaker >> - Pacemaker cleanly exits >> - CLVMD is stopped. >> – dlm_controld is stopped >> – corosync is being stopped >> >> and at this point the node is fenced (rebooted) by the dlm_controld on >> the other node. I would expect it continue with a clean shutdown. >> >> Any idea how to debug/fix it? >> Is this '541 cpg_dispatch error 9' the problem? >> >> Logs from the node being shut down (log file system mounted with the >> 'sync' >> option, syslog shutdown delayed as much as possible): >> >> Kernel: >> Nov 19 09:49:40 dev1n2 kernel: : [ 542.049407] block drbd0: worker >> terminated >> Nov 19 09:49:40 dev1n2 kernel: : [ 542.049412] block drbd0: Terminating >> drbd0_worker >> Nov 19 09:49:43 dev1n2 kernel: : [ 544.934390] dlm: clvmd: leaving the >> lockspace group... >> Nov 19 09:49:43 dev1n2 kernel: : [ 544.937584] dlm: clvmd: group event >> done 0 0 >> Nov 19 09:49:43 dev1n2 kernel: : [ 544.937897] dlm: clvmd: >> release_lockspace final free >> Nov 19 09:49:43 dev1n2 kernel: : [ 544.961407] dlm: closing connection >> to node 2 >> Nov 19 09:49:43 dev1n2 kernel: : [ 544.961431] dlm: closing connection >> to node 1 >> >> User space: >> Nov 19 09:49:41 dev1n2 pacemakerd[1267]: notice: stop_child: Stopping >> cib: Sent -15 to process 1279 >> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2] >> stonithd:1281:0x7fc423dfd5e0 is now disconnected from corosync >> Nov 19 09:49:41 dev1n2 cib[1279]: notice: terminate_cs_connection: >> Disconnecting from Corosync >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1db >> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2] >> cib:1279:0x7fc4240008d0 is now disconnected from corosync >> Nov 19 09:49:41 dev1n2 cib[1279]: notice: terminate_cs_connection: >> Disconnecting from Corosync >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1dd >> Nov 19 09:49:41 dev1n2 pacemakerd[1267]: notice: pcmk_shutdown_worker: >> Shutdown complete >> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2] >> pacemakerd:1267:0x7fc423bf8ed0 is now disconnected from corosync >> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2] >> pacemakerd:1267:0x7fc423bf7660 is now disconnected from corosync >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1de >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1de >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1e1 >> Nov 19 09:49:41 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1e1 >> Nov 19 09:49:43 dev1n2 dlm_controld[1142]: 541 cpg_dispatch error 9 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1e7 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [TOTEM ] Retransmit List: 1e7 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Unloading all Corosync >> service engines. >> Nov 19 09:49:43 dev1n2 corosync[1130]: [QB ] withdrawing server >> sockets >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync vote quorum service v1.0 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [QB ] withdrawing server >> sockets >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync configuration map access >> Nov 19 09:49:43 dev1n2 corosync[1130]: [QB ] withdrawing server >> sockets >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync configuration service >> Nov 19 09:49:43 dev1n2 corosync[1130]: [QB ] withdrawing server >> sockets >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync cluster closed process group service v1.01 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [QB ] withdrawing server >> sockets >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync cluster quorum service v0.1 >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync profile loading service >> Nov 19 09:49:43 dev1n2 corosync[1130]: [WD ] magically closing the >> watchdog. >> Nov 19 09:49:43 dev1n2 corosync[1130]: [SERV ] Service engine unloaded: >> corosync watchdog service >> Nov 19 09:49:43 dev1n2 corosync[1130]: [MAIN ] Corosync Cluster Engine >> exiting normally >> >> >> Logs from the surviving node: >> >> Kernel: >> Nov 19 09:49:39 dev1n1 kernel: : [80664.615988] block drbd0: conn( >> Unconnected -> WFConnection ) >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497187] dlm: clvmd: dlm_recover 11 >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497241] dlm: clvmd: >> dlm_clear_toss 1 done >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497291] dlm: clvmd: remove member >> 2 >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497295] dlm: clvmd: >> dlm_recover_members 1 nodes >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497298] dlm: clvmd: generation 15 >> slots 1 1:1 >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497300] dlm: clvmd: >> dlm_recover_directory >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497302] dlm: clvmd: >> dlm_recover_directory 0 in 0 new >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497304] dlm: clvmd: >> dlm_recover_directory 0 out 0 messages >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497307] dlm: clvmd: >> dlm_recover_masters >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497309] dlm: clvmd: >> dlm_recover_masters 0 of 1 >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497311] dlm: clvmd: >> dlm_recover_locks 0 out >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497313] dlm: clvmd: >> dlm_recover_locks 0 in >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497314] dlm: clvmd: >> dlm_recover_rsbs 1 done >> Nov 19 09:49:42 dev1n1 kernel: : [80667.497366] dlm: clvmd: dlm_recover >> 11 generation 15 done: 0 ms >> Nov 19 09:49:43 dev1n1 kernel: : [80668.211818] dlm: closing connection >> to node 2 >> Nov 19 09:49:46 dev1n1 kernel: : [80670.779015] igb: p1p2 NIC Link is Down >> >> User space: >> Nov 19 09:49:40 dev1n1 pengine[1078]: notice: stage6: Scheduling Node >> dev1n2 for shutdown >> Nov 19 09:49:40 dev1n1 pengine[1078]: error: rsc_expand_action: >> Couldn't expand vpbx_vg_cl_demote_0 >> Nov 19 09:49:40 dev1n1 pengine[1078]: error: rsc_expand_action: >> Couldn't expand vpbx_vg_cl_demote_0 >> Nov 19 09:49:40 dev1n1 pengine[1078]: notice: LogActions: Stop >> stonith-dev1n1 (dev1n2) >> Nov 19 09:49:40 dev1n1 pengine[1078]: notice: process_pe_message: >> Calculated Transition 17: /var/lib/pacemaker/pengine/pe-input-1035.bz2 >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1d1 >> Nov 19 09:49:40 dev1n1 crmd[1080]: notice: run_graph: Transition 17 >> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-input-1035.bz2): Complete >> Nov 19 09:49:40 dev1n1 crmd[1080]: notice: do_state_transition: State >> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >> cause=C_FSA_INTERNAL origin=notify_crmd ] >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1d4 >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1d6 1d8 >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1d6 >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1d6 >> Nov 19 09:49:40 dev1n1 crmd[1080]: notice: peer_update_callback: >> do_shutdown of dev1n2 (op 63) is complete >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1df >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1df >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1e3 >> Nov 19 09:49:40 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1e3 >> Nov 19 09:49:42 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1e6 >> Nov 19 09:49:42 dev1n1 corosync[1004]: [TOTEM ] Retransmit List: 1e9 >> Nov 19 09:49:43 dev1n1 corosync[1004]: [QUORUM] Members[1]: 1 >> Nov 19 09:49:43 dev1n1 crmd[1080]: notice: >> corosync_mark_unseen_peer_dead: Node 2/dev1n2 was not seen in the previous >> transition >> Nov 19 09:49:43 dev1n1 crmd[1080]: notice: crm_update_peer_state: >> corosync_mark_unseen_peer_dead: Node dev1n2[2] - state is now lost >> Nov 19 09:49:43 dev1n1 corosync[1004]: [TOTEM ] A processor joined or >> left the membership and a new membership (10.28.45.27:30736) was formed. >> Nov 19 09:49:43 dev1n1 corosync[1004]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Nov 19 09:49:43 dev1n1 dlm_controld[1014]: 80664 fence request 2 pid >> 27225 nodedown time 1353314983 fence_all dlm_stonith >> Nov 19 09:49:43 dev1n1 notifyd[1010]: [notice] 192.168.1.2[2] >> ip:192.168.1.2 left >> Nov 19 09:49:43 dev1n1 stonith-ng[1075]: notice: stonith_command: >> Client stonith-api.27225.b5ff8f4d wants to fence (reboot) '2' with device >> '(any)' >> Nov 19 09:49:43 dev1n1 stonith-ng[1075]: notice: >> initiate_remote_stonith_op: Initiating remote operation reboot for dev1n2: >> 71447261-0e53-4b20-b628-d3f026a4ae24 (0) >> Nov 19 09:49:44 dev1n1 external/ipmi[27242]: [27254]: debug: ipmitool >> output: Chassis Power Control: Reset >> Nov 19 09:49:45 dev1n1 stonith-ng[1075]: notice: log_operation: >> Operation 'reboot' [27234] (call 0 from stonith-api.27225) for host >> 'dev1n2' with device 'stonith-dev1n2' returned: 0 (OK) >> Nov 19 09:49:45 dev1n1 stonith-ng[1075]: notice: remote_op_done: >> Operation reboot of dev1n2 by dev1n1 for stonith-api.27225@dev1n1.71447261: >> OK >> Nov 19 09:49:45 dev1n1 crmd[1080]: notice: tengine_stonith_notify: Peer >> dev1n2 was terminated (st_notify_fence) by dev1n1 for dev1n1: OK >> (ref=71447261-0e53-4b20-b628-d3f026a4ae24) by client stonith-api.27225 >> >> Greets, >> Jacek >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster