Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE
Hi Muhammad, please ask the SAP Guys what they have changed. Did they install the sapstartsrv or saphostagent? Threads should not stop with its working now, but I could not explain what we have changed :) This stops others to learn from this situation. Regards Fabian On 01/15/2015 04:12 PM, Muhammad Sharfuddin wrote: Thanks for your excellent help, appreciated. I dont know what happened exactly, seems like SAP Guys has fixed the issue as now cluster start running the SAPInstance without any issue. Also find below the sapcontrol output thltlp2:tlpadm 48 /usr/sap/TLP/DVEBMGS00/exe/sapcontrol -nr 00 -function GetProcessList 15.01.2015 19:40:04 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid disp+work, Dispatcher, GREEN, Running, 2015 01 15 19:27:24, 0:12:40, 10920 igswd_mt, IGS Watchdog, GREEN, Running, 2015 01 15 19:27:24, 0:12:40, 10921 gwrd, Gateway, GREEN, Running, 2015 01 15 19:27:25, 0:12:39, 10938 icman, ICM, GREEN, Running, 2015 01 15 19:27:25, 0:12:39, 10939 thltlp2:tlpadm 49 Thanks once again. Regards, Muhammad Sharfuddin Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK http://www.nds.com.pk On 01/15/2015 03:57 PM, Fabian Herschel wrote: Hi Muhammad, please retry the command as user sidadm. Or inspect the resource agent for ALL environment variables to be set, not only LD_LIBRARY_PATH If sapcontrol would be disfunctional using sidadm you have a SAP problem and that could not be disussed here. Regards Fabian On 01/15/2015 11:49 AM, Muhammad Sharfuddin wrote: thltlp1:~ # echo $LD_LIBRARY_PATH /usr/sap/TLP/ASCS01/exe/:/usr/sap/TLP/DVEBMGS00/exe:/usr/lib64 thltlp1:~ # /usr/sap/TLP/DVEBMGS00/exe/sapcontrol -nr 00 -function Start Could not open the ICU common library. The following files must be in the path described by the environment variable LD_LIBRARY_PATH: libicuuc.so.50, libicudata.so.50, libicui18n.so.50 [/bas/741_REL/src/flat/nlsui0.c 1535] pid = 27543 LD_LIBRARY_PATH is currently set to not set [/bas/741_REL/src/flat/nlsui0.c 1538] pid = 27543 thltlp1:~ # please help Regards, Muhammad Sharfuddin Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK http://www.nds.com.pk On 01/15/2015 02:15 PM, Fabian Herschel wrote: On 01/14/2015 10:53 PM, Muhammad Sharfuddin wrote: On 01/15/2015 02:35 AM, Fabian Herschel wrote: Hi Muhammed, sorry please do NOT use startsap. Please use sapctrl. sapctrl -nr 00 -function Start Check the started processes using sapctrl -nr 00 -function GetProcessList I dont find the sapctrl command available on the system. Sorry the command is sapcontrol (I abbreviated the control to ctrl) From the SAPInstance resource agent: SAPCONTROL=/usr/sap/$SID/$InstanceName/exe/sapcontrol If disp+work processes are not starting than you might need to check the reason in the work directory of the SAP NetWaver instance. Thanks for the pointer, I'll get this check with SAP Guys Regards Fabian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: SAPInstance does not start and asking for START_PROFILE
No only in very old versions there where two different profiles. One START profile and one instance profile. Today they are combined in the instance profile. The parameter name however could not be changed without hurding all installations outside. So its ok with a current NetWeaver, if the startprofile param points to an instance profile name. However if this is a very old SAP NetWeaver than your idea the right pointer. This is also why I asked for the SAP kernel version. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de Datum:15.01.2015 08:51 (GMT+01:00) An: General Linux-HA mailing list linux-ha@lists.linux-ha.org Betreff: [Linux-HA] Antw: SAPInstance does not start and asking for START_PROFILE Don't SAP start profiles start with START_*`? Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am 14.01.2015 um 20:12 in Nachricht 54b6bfb2.50...@nds.com.pk: OS: SLES 11 SP 3 pacemaker-1.1.9-0.19.102 corosync-1.4.5-0.18.15 resource-agents-3.9.5-0.32.22 starting the SAP Instance resource fails with following errors: Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: Expected TLP_DVEBMGS00_thltlp to be the instance START profile, please set START_PROFILE parameter! Jan 14 18:22:16 thltlp1 crmd[47231]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) not configured following is the resource configurations: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile i.e START_PROFILE is configured but cluster is not starting the SAP Instance. Please help -- Regards, Muhammad Sharfuddin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE
On 01/14/2015 10:53 PM, Muhammad Sharfuddin wrote: On 01/15/2015 02:35 AM, Fabian Herschel wrote: Hi Muhammed, sorry please do NOT use startsap. Please use sapctrl. sapctrl -nr 00 -function Start Check the started processes using sapctrl -nr 00 -function GetProcessList I dont find the sapctrl command available on the system. Sorry the command is sapcontrol (I abbreviated the control to ctrl) From the SAPInstance resource agent: SAPCONTROL=/usr/sap/$SID/$InstanceName/exe/sapcontrol If disp+work processes are not starting than you might need to check the reason in the work directory of the SAP NetWaver instance. Thanks for the pointer, I'll get this check with SAP Guys Regards Fabian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Antw: SAPInstance does not start and asking for START_PROFILE
On 01/15/2015 11:35 AM, Ulrich Windl wrote: Fabian Herschel fabian.hersc...@arcor.de schrieb am 15.01.2015 um 09:05 Hi! I'm working for over 20 year with SAP R/3. Me to :) I started with SAP R/3 1.1b and also have seen 1.0 which was SAP R/3 try and buy (which I changed in a non marketing try-and-bye :) The tendency is that the mess steadily increases, making things more complex without actually improving the reliability (MHO). Where previously you used a script to start processes, you now have some script that acts like a web client to send some request to a java-based web server that in turn is expected to start the required processes. My guess is that this is W*d*ws-style so having GUIs which are using someting like rest-APIs to provide a system change instead of just starting a well-working start script (like sapstart was in the past). Ive' see cases where a started sucessfully did not mean anything, i.e. nothing was started. OK, that was off-topic, but I had to say. Yes that's unfortunately true and that's also why we recommend tom monitor the SAP instances to figure out real start error during the next monitor cycle. So regarding compatibility, you must make sure that the configuration files and related components match hte rest of your SAP infrastructure. My favorite is this: If you run a java stack that does nothing (is idle), it takes 6 minutes to shut down the SAP instance (while there is no I/O and no CPU activity). Only the best programmers (TM) can write such code. I had to increase timeouts several times to prevent machine fencing while waiting for the stop command to complete... off-topic again, sorry. Regards, Ulrich However if this is a very old SAP NetWeaver than your idea the right pointer. This is also why I asked for the SAP kernel version. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de Datum:15.01.2015 08:51 (GMT+01:00) An: General Linux-HA mailing list linux-ha@lists.linux-ha.org Betreff: [Linux-HA] Antw: SAPInstance does not start and asking for START_PROFILE Don't SAP start profiles start with START_*`? Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am 14.01.2015 um 20:12 in Nachricht 54b6bfb2.50...@nds.com.pk: OS: SLES 11 SP 3 pacemaker-1.1.9-0.19.102 corosync-1.4.5-0.18.15 resource-agents-3.9.5-0.32.22 starting the SAP Instance resource fails with following errors: Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: Expected TLP_DVEBMGS00_thltlp to be the instance START profile, please set START_PROFILE parameter! Jan 14 18:22:16 thltlp1 crmd[47231]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) not configured following is the resource configurations: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile i.e START_PROFILE is configured but cluster is not starting the SAP Instance. Please help -- Regards, Muhammad Sharfuddin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE
Hi Muhammad, could you try: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset, so SAPInstance could use the automatical detection. If you set the param START_PROFILE it must be a full file path NOT relative to DIR_PROFILE. Hope that helps Best regards Fabian On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote: OS: SLES 11 SP 3 pacemaker-1.1.9-0.19.102 corosync-1.4.5-0.18.15 resource-agents-3.9.5-0.32.22 starting the SAP Instance resource fails with following errors: Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: Expected TLP_DVEBMGS00_thltlp to be the instance START profile, please set START_PROFILE parameter! Jan 14 18:22:16 thltlp1 crmd[47231]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) not configured following is the resource configurations: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile i.e START_PROFILE is configured but cluster is not starting the SAP Instance. Please help ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE
Hi Muhammed, sorry please do NOT use startsap. Please use sapctrl. sapctrl -nr 00 -function Start Check the started processes using sapctrl -nr 00 -function GetProcessList If disp+work processes are not starting than you might need to check the reason in the work directory of the SAP NetWaver instance. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk Datum:14.01.2015 21:13 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE On 01/15/2015 01:07 AM, Muhammad Sharfuddin wrote: On 01/15/2015 12:46 AM, Fabian Herschel wrote: Hi Muhammad, could you try: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset, so SAPInstance could use the automatical detection. If you set the param START_PROFILE it must be a full file path NOT relative to DIR_PROFILE. Hope that helps Best regards Fabian On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote: OS: SLES 11 SP 3 pacemaker-1.1.9-0.19.102 corosync-1.4.5-0.18.15 resource-agents-3.9.5-0.32.22 starting the SAP Instance resource fails with following errors: Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: Expected TLP_DVEBMGS00_thltlp to be the instance START profile, please set START_PROFILE parameter! Jan 14 18:22:16 thltlp1 crmd[47231]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) not configured following is the resource configurations: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile i.e START_PROFILE is configured but cluster is not starting the SAP Instance. Please help provide the full path, and now error changed. It became: Jan 15 00:54:04 thltlp2 cibadmin[24511]: notice: crm_log_args: Invoked: cibadmin -p -R -o resources Jan 15 00:54:04 thltlp2 SAPInstance(SAPInst-DVEBMGS00)[22589]: ERROR: SAP Instance TLP-DVEBMGS00 start failed: 15.01.2015 00:54:04 WaitforStarted FAIL: process disp+work Dispatcher not running Jan 15 00:54:04 thltlp2 crmd[2778]: warning: do_update_resource: Resource SAPInst-DVEBMGS00 no longer exists in the lrmd Jan 15 00:54:04 thltlp2 crmd[2778]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=176, rc=7, cib-update=0, confirmed=true) not running Jan 15 00:54:04 thltlp2 crmd[2778]: warning: decode_transition_key: Bad UUID (crm_resource.c) in sscanf result (4) for 24450:0:0:crm_resource.c Jan 15 00:54:04 thltlp2 crmd[2778]:error: send_msg_via_ipc: Unknown Sub-system (9f52a2cf-6c1d-453d-bc1b-90322f3147f4)... discarding message Regards, Muhammad Sharfuddin also note that I can very easily start the SAP without any issue via running following command: startsap -i DVEBMGS00 -v thltlp -- Regards, Muhammad Sharfuddin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE
Please ask the sap guys which version of sap netweaver and which sap kernel you are using. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk Datum:14.01.2015 22:53 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE On 01/15/2015 02:35 AM, Fabian Herschel wrote: Hi Muhammed, sorry please do NOT use startsap. Please use sapctrl. sapctrl -nr 00 -function Start Check the started processes using sapctrl -nr 00 -function GetProcessList I dont find the sapctrl command available on the system. If disp+work processes are not starting than you might need to check the reason in the work directory of the SAP NetWaver instance. Thanks for the pointer, I'll get this check with SAP Guys Regards Fabian -- Regards, Muhammad Sharfuddin Ursprüngliche Nachricht Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk Datum:14.01.2015 21:13 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE On 01/15/2015 01:07 AM, Muhammad Sharfuddin wrote: On 01/15/2015 12:46 AM, Fabian Herschel wrote: Hi Muhammad, could you try: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset, so SAPInstance could use the automatical detection. If you set the param START_PROFILE it must be a full file path NOT relative to DIR_PROFILE. Hope that helps Best regards Fabian On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote: OS: SLES 11 SP 3 pacemaker-1.1.9-0.19.102 corosync-1.4.5-0.18.15 resource-agents-3.9.5-0.32.22 starting the SAP Instance resource fails with following errors: Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: Expected TLP_DVEBMGS00_thltlp to be the instance START profile, please set START_PROFILE parameter! Jan 14 18:22:16 thltlp1 crmd[47231]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) not configured following is the resource configurations: primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \ op monitor interval=120 timeout=60 \ op start interval=0 timeout=300 \ op stop interval=0 timeout=300 \ params InstanceName=TLP_DVEBMGS00_thltlp DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile i.e START_PROFILE is configured but cluster is not starting the SAP Instance. Please help provide the full path, and now error changed. It became: Jan 15 00:54:04 thltlp2 cibadmin[24511]: notice: crm_log_args: Invoked: cibadmin -p -R -o resources Jan 15 00:54:04 thltlp2 SAPInstance(SAPInst-DVEBMGS00)[22589]: ERROR: SAP Instance TLP-DVEBMGS00 start failed: 15.01.2015 00:54:04 WaitforStarted FAIL: process disp+work Dispatcher not running Jan 15 00:54:04 thltlp2 crmd[2778]: warning: do_update_resource: Resource SAPInst-DVEBMGS00 no longer exists in the lrmd Jan 15 00:54:04 thltlp2 crmd[2778]: notice: process_lrm_event: LRM operation SAPInst-DVEBMGS00_start_0 (call=176, rc=7, cib-update=0, confirmed=true) not running Jan 15 00:54:04 thltlp2 crmd[2778]: warning: decode_transition_key: Bad UUID (crm_resource.c) in sscanf result (4) for 24450:0:0:crm_resource.c Jan 15 00:54:04 thltlp2 crmd[2778]:error: send_msg_via_ipc: Unknown Sub-system (9f52a2cf-6c1d-453d-bc1b-90322f3147f4)... discarding message Regards, Muhammad Sharfuddin also note that I can very easily start the SAP without any issue via running following command: startsap -i DVEBMGS00 -v thltlp ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Virtual address for slave
Create 2 constraints: 1. Colocation between ip adress (the one for the master) and master status of your mastr/slave resource: You need to add the status master (instead of start, which is the default) to the constraint. 2. Colocation between ip adress (the one for the slave) and slave status of you master/slave resource:. Add also the status slave to the constraint definition. It might you also need to adjust the score of your constraints, depending on the exact needs. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: jarek ja...@poczta.srv.pl Datum:01.08.2014 09:39 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: [Linux-HA] Virtual address for slave Hello! I'd like to have two virtual adresses: vip-master and vip-slave. vip-master should be bound to master mode, vip-slave should be bound to slave node. How can I do it ? Best regards Jarek ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to restart cluster ?
Hi, do you reboot always both nodes the same time, or do you reboot only one node. Stopping only the resources during reboot is pretty bad. I would add the cluster startscript like /etc/init.d/openais to you start/stop sequence. This would also tell the left node about the leaving and joining node properly. Regards Fabian Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: jarek ja...@poczta.srv.pl Datum:09.06.2014 11:58 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: [Linux-HA] How to restart cluster ? Hello! Thank you for the answer, but this answer didn't solve my problem. I have simple two-node cluster with virtual ip address and Postgres with streaming replication, created with this tutorial: http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster I have two problems to solve: 1. I need some script, which will restart cluster on user demand. This script should stop postgres resource on both nodes and next restart them in that way, that postgres will be work without any additional operations (like removing lock files, cleaning resources etc). 2. I have a virtual model of this cluster working under VMWare. VMWare is restarted from time to time, and I have no control when master or slave will be restarted. I would like to create script, which will be called from runlevel 6 and will safely stop postgres resource. I tried to do it with: crm configure property stop-all-resources=true but after reboot I had to remove PGSQL.lock manually, and also master node has been changed. Do you have any idea how to do it ? Taktoshi MATSUO wrote: Do you use pgsql RA with Master/Slave setting ? I recommend you to stop slave node's pacemaker at first because pgsql RA removes PGSQL.lock automatically if the node is master and there is no slaves. Stop procedure 1. stop slave node - suppose nodeB 2. stop master node (PGSQL.lock file is removed) - suppose nodeA Start procedure 3. start the nodeA because it has the newest data. 4. start the nodeB If PGSQL.lock exists, the data may be inconsistent. See http://www.slideshare.net/takmatsuo/2012929-pg-study-16012253 (P36, P37) best regards Jarek ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How I can create unordered group of resources
On 05/06/2014 01:08 AM, Andrew Beekhof wrote: On 5 May 2014, at 10:06 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: On 05/05/2014 02:36 AM, Andrew Beekhof wrote: On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: I would create the group with the meta attributr for unordered resources. Meta odered=false N. Use a colocation set. Could you explain your No? Whats wrong in using the unordered feature? Why was this meta attribute added to groups, if we shouldn't use it? Its an abomination that I should never have implemented but now cannot remove. OK, thanks :) Till this thread I never suggested groups to be configured unordered. I do not know why I did break my rule. I also feeled uncomfortable with the meta attributes... Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Vladimir Romanov vroma...@gmail.com Datum:03.05.2014 10:29 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: [Linux-HA] How I can create unordered group of resources Hello! I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS). I create master/slave statefull resource. My setup also have many other resources (IPs, Routes, LSB...). I one of resources is failed on first mode I want to move all resources to another node. Now I use group to create this setup. But when I kill -9 some process all processes listen below also restarted. That is best practice for this task? -- Vladimir Romanov ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How I can create unordered group of resources
On 05/05/2014 02:36 AM, Andrew Beekhof wrote: On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: I would create the group with the meta attributr for unordered resources. Meta odered=false N. Use a colocation set. Could you explain your No? Whats wrong in using the unordered feature? Why was this meta attribute added to groups, if we shouldn't use it? Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Vladimir Romanov vroma...@gmail.com Datum:03.05.2014 10:29 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: [Linux-HA] How I can create unordered group of resources Hello! I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS). I create master/slave statefull resource. My setup also have many other resources (IPs, Routes, LSB...). I one of resources is failed on first mode I want to move all resources to another node. Now I use group to create this setup. But when I kill -9 some process all processes listen below also restarted. That is best practice for this task? -- Vladimir Romanov ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How I can create unordered group of resources
I would create the group with the meta attributr for unordered resources. Meta odered=false Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von: Vladimir Romanov vroma...@gmail.com Datum:03.05.2014 10:29 (GMT+01:00) An: linux-ha@lists.linux-ha.org Betreff: [Linux-HA] How I can create unordered group of resources Hello! I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS). I create master/slave statefull resource. My setup also have many other resources (IPs, Routes, LSB...). I one of resources is failed on first mode I want to move all resources to another node. Now I use group to create this setup. But when I kill -9 some process all processes listen below also restarted. That is best practice for this task? -- Vladimir Romanov ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] 2 Nodes split brain, distant sites
Hi, my first idea would be to fix binnetaddr. It should be the networkaddress not the machines network address. Regards Fabian On 02/27/2014 03:42 PM, TRIBOLET Thomas wrote: Hello, Before starting, my first language is French so I'll try to do my best to explain my problem in English. 1) The situation : I have 2 servers on 2 distant site. I need to run openvpn with the same configuration on the 2 servers. But it must run only on one server at a time. I want that it start on the second server when the connection with internet is lost on the first node. I use debian with corosync and pacemaker. Here is the config : A) Corosync.conf : compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 240 consensus: 3600 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: off threads: 0 nodeid: rrp_mode: none interface { member { memberaddr: 172.16.135.9 } member { memberaddr: 172.16.64.248 } ringnumber: 0 bindnetaddr: 172.16.135.9 mcastport: 5405 } transport: udpu } amf { mode: disabled } service { ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } B) Pacemaker : node controle-col node vpn-air primitive ClusterMon ocf:pacemaker:ClusterMon \ params user=root update=30 extra_options=-E /root/PacemakerMailScript.sh -h /tmp/ClusterMon.html \ op monitor on-fail=restart interval=60 primitive openvpn lsb:openvpn \ op monitor interval=30s primitive p_ping ocf:pacemaker:ping \ params host_list=8.8.8.8 4.2.2.2 multiplier=100 dampen=5s \ op monitor interval=60 timeout=60 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 clone ClusterMon-clone ClusterMon clone c_ping p_ping location OpenVpnCluster openvpn \ rule $id=OpenVpnCluster-rule -inf: not_defined pingd or pingd lte 0 location PrefVpnAir openvpn \ rule $id=PrefVpnAir-rule 50: #uname eq vpn-air property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore C) Running good crm_mon Last updated: Thu Feb 27 14:54:31 2014 Last change: Wed Jan 15 12:51:35 2014 via crmd on controle-col Stack: openais Current DC: controle-col - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 5 Resources configured. Online: [ vpn-air controle-col ] Clone Set: c_ping [p_ping] Started: [ controle-col vpn-air ] openvpn (lsb:openvpn): Started vpn-air Clone Set: ClusterMon-clone [ClusterMon] Started: [ controle-col vpn-air ] 2) My problem : When there is a network problem : Ex : a) first-node site lost internet connection ( and communication with second-node at same time due to vpn on internet connection ) b) cluster stop openvpn on first node and launch it on second due to primitive p_ping in config. c) connection come back on first-node site d) Problem : first-node and second-node don't bring back cluster, the don't see each other and create a cluster on each node - split brain I think. e) Each node has openvpn running which shouldn't happen I don't have stonith running because I think without quorum it will be problematic Is there a way to say to corosync to recreate a ring ? Or have someone another solution ? Thanks Tribolet Thomas ISSeP (Institut Scientifique de Service Public) th.tribo...@issep.bemailto:th.tribo...@issep.be +32 (0) 4229 83 46 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Funny messages from crm resource restart (SLES11 SP2 vs. SP3)
Hi, Did you run the ha update like a rolling update, so one node with current version online, the other down. Than updating the offline node, reentering the cluster again? In this case I would think the cluster is ok but still only supports the old options. Its different from the situation'when you would restart both nodes with only one system updated. If I got it corredctly only the first method is recommended. Besst regards Fabian Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb: Hi! when trying to restart a Xen VM after installing updates in the gust I see some funny messages (one node is at SLES11 SP2, while the other node is at SLES11 SP3): First attempt: h05:~ # crm resource restart prm_xen_v04 INFO: ordering prm_xen_v04 to stop No messages received in 30 seconds.. aborting WARNING: crmadmin -S h01 unexpected output: (exit code: 253) h05:~ # crmadmin -S 01 Status of crmd@h01: S_IDLE (ok) Second attempt: h05:~ # crm resource restart prm_xen_v04 INFO: ordering prm_xen_v04 to stop No messages received in 30 seconds.. aborting WARNING: can't find DC However both nodes are online: h05:~ # crm_mon -1Arf Last updated: Mon Dec 23 11:52:44 2013 Last change: Mon Dec 23 11:49:59 2013 by root via cibadmin on h05 Stack: openais Current DC: h01 - partition with quorum Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf 2 Nodes configured, 2 expected votes 18 Resources configured. Online: [ h01 h05 ] [...] Is it when running the new crm shell (SP3) for a DC that is still SP2? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Usage of SAPDatabase resource agent without SAPHostAgent is deprecated
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Muhammad, please find my answer below... On 29.04.2013 17:19, Muhammad Sharfuddin wrote: On 04/29/2013 08:04 PM, Lars Marowsky-Bree wrote: On 2013-04-26T23:56:04, Muhammad Sharfuddin m.sharfud...@nds.com.pk wrote: I just upgraded from SP1(SLES 11/HAE) to SP2, and getting following messages when I start the SAPDBInstance resource: SAPDatabase(SAPDBInstance)[22587]: [22602]: WARNING: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated. Please read documentation of SAPDatabase resource agent and follow SAP note 1031096 for the installation of SAPHostAgent. there is a chapter HA OCF Agents in the High Availability Guide of SP2, but I found nothing new/peculiar about SAPDBInstance there. The warning just indicates that a new configuration is preferred for SAP HA deployments going forward. and what's that new preferred configuration is ? where I can see/learned it ? The new setup preferred by _SAP_ is, that you use the saphostagent. This new menthod gives you the following benefits (you might not need today) 1. Support for DB only on a server and Java (only) Instance on an other server (this is currently not working, because in that case the bootstrap files are missing at the DB server) 2. Support for Sybase (you can only control the old set of databases like DB2 (older version than 10), MaxDB and Oracle 3. You use the more generic DB Interface written by SAP and the DB-Vendors, so the resource agent does not need to take care about specialities of database but could concentrate on the SAP specific control of the database The setup with saphostcontrol is documented by SAP and out of scope of the documentation of the resource agent, because its SAP land and they could/should describe that. For the resource agent the only change is, that the warning disappears. If you do not find the SAP docu (SDN, installation guides) about this topic I could try to sent you an URL, but as it is SAP documentation it could be that you need a SAP marketplace login for that. The former version is still supported (and working, I trust). Yes with at least one limitation: For Java (only) workloads the there must be at least one Java instance installed on a node, where DB should be able to run (the bootstrap files of the java framework are needed for monitoring the database). When you get a chance, you should install the SAP host agent as per the reference SAP note. (Which I can't look up, it seems.) I asked the SAP Consultant to install the SAPHostAgent. Perfect! The warning does not indicate an error, but tries to get your attention about improving your configuration during the next appropriate maintenance window. It seems that worked ;-) correct, cluster never shows any error when it runs/start the SAPInstance, though we found that whenever we run the SAPInstance from cluster SAP gives us errors while login via SAP Client(SAPGui). Which error? Could it be a license problem? A standard SAP license applies to one Hardware-Key only - your SAP consultatnt should be able to solve this. as a workarround, we stopped the SAPInstance from the cluster, let the cluster runs/start the other resources(IP, File Systems, and SAPDBInstance) and then manually started the SAPInstance(via sap way): startsap -i DVEBMGS00 -v pgtprd This should not be needed - there must a something wrong - which error message did you get when SAPInstance was controlled by cluster and you login via SAPGui? If you get a SICK message about 3.0 kernel please update to a current SAP Kernel (I could provide the SAP Note number if needed). Regards Fabian Regards, Lars Regards, Muhammad Sharfuddin On 04/29/2013 08:04 PM, Lars Marowsky-Bree wrote: On 2013-04-26T23:56:04, Muhammad Sharfuddin m.sharfud...@nds.com.pk wrote: I just upgraded from SP1(SLES 11/HAE) to SP2, and getting following messages when I start the SAPDBInstance resource: SAPDatabase(SAPDBInstance)[22587]: [22602]: WARNING: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated. Please read documentation of SAPDatabase resource agent and follow SAP note 1031096 for the installation of SAPHostAgent. there is a chapter HA OCF Agents in the High Availability Guide of SP2, but I found nothing new/peculiar about SAPDBInstance there. The warning just indicates that a new configuration is preferred for SAP HA deployments going forward. The former version is still supported (and working, I trust). When you get a chance, you should install the SAP host agent as per the reference SAP note. (Which I can't look up, it seems.) The warning does not indicate an error, but tries to get your attention about improving your configuration during the next appropriate maintenance window. It seems that worked ;-) Regards, Lars ___ Linux-HA mailing list
Re: [Linux-HA] Antw: Re: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 30.04.2013 08:27, Ulrich Windl wrote: Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am 29.04.2013 um 14:16 in Nachricht 517e6482.2040...@nds.com.pk: I think that you should just follow that advice, i.e. read that SAP note and install SAPHostAgent. I asked the SAP Consultant to install the SAPHostAgent issue. See also the agents documentations: crm ra info SAPDatabase I read it and found nothing that help me fix this issue. The good news is that it still works despite of the warning. The RA is a good example how to do a simple thing with maximum complexity. According to my little understanding that SAPHostAgent is a web server running as root, launching the sap start script on demand. The RA in turn sends a HTTP request to the Host Agent to start the process. I did not care to examine how authentication works, because I want to be able to sleep at night ;-) Oh you could sleep at night, even when I explain it: The autorization is made by a file permission of a socket on the system. So the Linux/Unix file permissions are controlling the permission to sent a set of commands to sapstartsrv / saphostagent. (Others could also be sent without that file permission - the set if comamnds needing authorization is controlled by a SAP configuration.) There are 3 (or more?) methods to authenticate: a) without (for simple unproblematic commands) b) via socket/file permission c) with username/password c) of course could not be used by the RA without introducing a security problem (and so does not try it :) I could not join your statement about the resource agent. The interface HOW to start/stop databases and instances is given by SAP, so the author of the RA implemented it in a SAP preferred way. The reason for the Webservice and to force also the RA is using it that the Webserver is THE interafce for all methods to control SAPDatabase and SAPInstances from outside. Its used by - SAP MMC - SAPMC - sapcontrol and maybe even by more... Regards Fabian Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRf5RqAAoJEJ1uHhrzMvZRtPAH+wSFXab9rjLujhSiqfJvKK6X IuIPadkxc9PutiqyVLbEL5J976R27aPwiR5xuJP9TkVbygVuq+C+lvhhccEFRb/7 wB0oROFss3htK/qQGkV6oLkTARFTbfo6luWoUzDIWYE+e4BC5VeCy5EG3bUYOvSn +HIP4Chb1zCvyJqTvRjiTqp32cFpuYmSneTE3HrirrqGoD3gCkjAFlYIROgxbJ0h xCSdA8/zJt8WzcqzNUuqNHv3mrMqiifYwUXYghd8wZmmwZiz1ZZfx7mOlqxwbwiw EhqqEQUj9Or/V7q9L0Aw5OJ1Uuqt4vei7YXRqteIRX2xRrCVLR+Km1u6jQJyl+A= =qRA0 -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 30.04.2013 13:22, Muhammad Sharfuddin wrote: Hello Fabian, On 04/30/2013 02:19 PM, Fabian Herschel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Muhammad, please find my answer below... The new setup preferred by _SAP_ is, that you use the saphostagent. This new menthod gives you the following benefits (you might not need today) 1. Support for DB only on a server and Java (only) Instance on an other server (this is currently not working, because in that case the bootstrap files are missing at the DB server) 2. Support for Sybase (you can only control the old set of databases like DB2 (older version than 10), MaxDB and Oracle 3. You use the more generic DB Interface written by SAP and the DB-Vendors, so the resource agent does not need to take care about specialities of database but could concentrate on the SAP specific control of the database thanks a lot for sharing and explaining. correct, cluster never shows any error when it runs/start the SAPInstance, though we found that whenever we run the SAPInstance from cluster SAP gives us errors while login via SAP Client(SAPGui). as a workarround, we stopped the SAPInstance from the cluster, let the cluster runs/start the other resources(IP, File Systems, and SAPDBInstance) and then manually started the SAPInstance(via sap way): startsap -i DVEBMGS00 -v pgtprd This should not be needed - there must a something wrong - which error message did you get when SAPInstance was controlled by cluster and you login via SAPGui? If you get a SICK message about 3.0 kernel please update to a current SAP Kernel (I could provide the SAP Note number if needed). as said cluster always successfully starts the SAPInstance without any error, but when we login into SAP via SAPGui there we got the following error: Run time Errors.START_CALL_SICK.short text database inconsistency .start trasaction SICK. OK, this looks for me like the Linux Kernel is detected as 3.0 instead of 2.6. Could you (if needed with your SAP consultant) login and check, if there is something about the unknown 3.0 kernel? In this case the SAP Kernel should be updated. as a workaround I **only** stopped the SAPInstance from cluster(let the IP, File Systems, SAPDBInstance remain running via cluster) and start SAPInstance via command line startsap -i DVEBMGS00 -v pgtprd. SAP kernel version is 701, and we are running SAP on SLES 11 SP2 via Kernel 2.6 compatibility mode for SAP (SAP note 1310037) Yes the 2.6 compatibility environment was only inteded to be used as a bridge between the avaialblibity of Linux 3.0 kernel and the customer to be able to update to newest SAP Kernels like 720 PL 402 or so. Sorry, that I couldn't solve that problem via this list, it now gets very SAP and support specific - if you have already opened a ticket at SUSE, please reference to that thread (my colleague already has read it :) and than we could help you with professional support. In sum its that SAP also wants you to update to the newer SAP kernels and my guess is that this is exactly your problem. While starting/stopping the instance with login-user/shell will lead into the 2.6 compat environment, the RA still runs in 3.0. == solution is either to change one line in RA (NOT preferred) or to update SAP kernel to current version (very appreciated and preferred :) Regards, Muhammad Sharfuddin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRf8Y0AAoJEJ1uHhrzMvZR9/QIAIKZs1RyGLfZZ9aUlJZR7EGM V7thYcSldUGn0HivtW9N+kufxJHfapJ70L1o9wAw0kTbq5CaVgt42B177zB4Kq3q 5q6db1ouDh7ZufV+6Dprhff8mplEMrTCJKDPjYnna7COYzkWYPun2FBNPmAV1pGs rBmxDBH9enZ5Piacj357Rqqs2mFhmnBeSDOIDDqMX8BBG+MIuslYOoBfyzwUTilv ECJnkAHQZcT9CsRJ6wLkQCfFSD+HzpGp3tLZhYxi9ub7SPlthCI8vJOgp5HZhbLp SvR9SCp3RG71+HLWKuCBd+u/JWDuymFnZ8jIoyUDWFVJRdkT59jsha2T2qIqbRI= =nIwo -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Q: NFS cross mounting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/21/2012 08:14 AM, Ulrich Windl wrote: Dimitri Maziuk dmaz...@bmrb.wisc.edu schrieb am 20.12.2012 um 17:50 in Nachricht 50d341c3.1040...@bmrb.wisc.edu: Hi! So (pseudo-code following) if (host(NFS_server) == host(NFS_client)) rmdir(mountpoint); ln -s export_dir mountpoint else makedir(mountpoint); mount(NFS_Server:export_dir, mountpoint) Hmm - I was trying to something similar using bind mounts instead of symlinks which is more compatible to applications which may probe for DIRECTORIES and not also SYMLINKS. Unfortunately at least for SAP Workloads this is not an option, because all Instances must be killed to switch from a bind mount to an NFS mount and vice versa. This would decrease the availability of the Application level. The problem are already opened file handels which can't be shifted from the bind-Mount to the NFS-mount. For all files and directories which are only opened after the FS switch that might be OK, but again that would also mean to kill all application processes using the FS. I do not see that this would be a fissible solution. Kind regards Fabian ? Now when som program does a cd mountpoint and the NFS Server would move to another host, you'd have to kill all processes that use mountpoint to be able to unmount it. In contrast with the NFS solution, the client applications would be blocked until NFS server is running on the other node. Obviously this solution seems preferrable. Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQEcBAEBAgAGBQJQ1f7hAAoJEJ1uHhrzMvZRmtcIAKeeOvfr67Ml3+oEd8mLI8hh WtqOfHzF5S1zhlnV+T3yli64E7xxFP42cR41dWJlZTU/McC6fFQbMklXLgagrqWc 0PC81BL9i4dBHFqFZDyg/GPmfSusXU3FCFftR5qYyiF6SAUfbdKWgzCUqCzpCcXX XHkOM8z9j9mgCDmYpbdjZfFyDu7XtVwQNyCl+OV5MBw3K0xBNBabpZ1yoYG7m5Nz xG4dk9YcO1PtReo0PkT2gg9vTJT8umPQdKGI6O6RstnpJR5lOCKHWUIjZ4tzlNqU bNoNSVLS4SYG6bH1hF8ZiC1p6Kc5ZxDyxJ51MbMmz7fkKum0BKUBCtm+a6TAcDA= =DM80 -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] why nodes cant see each other ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 As you are using Multcast (MCAST) - could it be the case that the switch/LAN dropped all Multicast packages for some time? As lot of switches which are managed are dropping MCAST by default (at least I got that feedback from customers) it could be that your switch was either reconfigured for a time period or there was a fireware update? Just my thoughts abou things happened at customer side. Fabian Herschel On 12/14/2012 06:31 AM, Muhammad Sharfuddin wrote: node1(ailprd1) IP:192.168.7.11 node2(ailprd2) IP:192.168.7.12 Its a two node active/passive cluster, running perfectly since last two months, but yesterday both nodes were fenced(rebooted). Network connectivity b/w both nodes is perfect, and cluster is running fine again. Help me know the reason behind the following situation, and how can I avoid it happening next time: on node1(active node): Dec 13 12:31:06 ailprd1 corosync[7274]: [TOTEM ] A processor failed, forming new configuration. Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM CONFIGURATION CHANGE Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New Configuration: Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.11) Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] Members Left: Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.12) on node2(passive node): Dec 13 12:31:05 ailprd2 corosync[7021]: [TOTEM ] A processor failed, forming new configuration. Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM CONFIGURATION CHANGE Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New Configuration: Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.12) Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] Members Left: Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.11) for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks that node1 left. then node2 tries to start the resources which were already running on node1, and both nodes were fenced. corosync.conf : totem { rrp_mode: none join: 60 max_messages: 20 vsftype: none consensus: 6000 secauth: off token_retransmits_before_loss_const: 10 token: 5000 version: 2 interface { bindnetaddr: 192.168.7.0 mcastaddr: 224.0.0.116 mcastport:51234 ringnumber: 0 } clear_node_high_bit:yes } logging { to_logfile: no to_syslog: yes debug: off timestamp: off to_stderr:no fileline:off syslog_facility: daemon } Regards, Muhammad Sharfuddin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQEcBAEBAgAGBQJQywndAAoJEJ1uHhrzMvZRutAIAL4MW1q2hUPH6cU6Md4ZjSl2 T6C8c+LIjBCGjSIBwwFgMVbMqeB78n/IFUw5QcRkiZVAZ8rDaDEIcb28pJ88yQdu Fr+zkxO3jO30bVyo5KW0672KDYjTlJnUWjBWC+FdG5TSWyPHfnKQew06BwoQxqR+ ad4EUESJhKsRnobFkIZZHVUTXc4EUDn3U/zROh/c29k0JVblt3xip08bZLuaS7yg vBxOavCpWidvukhKdtnN1gOKsnhvqcHmz+yQlMM8Du03U7rcRQsA2ORruFoODh0l yY0hOWtVkgh7iVHdA3RZfMj2yAGQGSggIMHS7YA3k9J4/8cU1AfIOTUWxY61RI4= =egRI -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] why nodes cant see each other ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Muhammad, also find my ansawer inline... On 12/14/2012 12:55 PM, Muhammad Sharfuddin wrote: On Fri, 2012-12-14 at 16:47 +0500, Muhammad Sharfuddin wrote: please find me replies in-line On Fri, 2012-12-14 at 12:13 +0100, Fabian Herschel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 As you are using Multcast (MCAST) yes so using Unicast instead of MCAST, would be a solution ? It COULD be a solution, if the network was the problem. Some years ago I wrote a tiny programm to send MCAST with high load and to count drops - maybe I can reload the code and sent it to you, if you are interested. Its GPL so free to use under the terms of GPL and no warranty :) -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQEcBAEBAgAGBQJQyyUjAAoJEJ1uHhrzMvZR5kkIALI6zLT17EmCgZww/rH95kZq jYMSpmlPYAhAQahjO3SvGf3Fj3yiaPACtbAkmmUAgewspp7Xe/WrqZrYv6OvqR79 MStU+bS7Qs3P2GES44czkpes9SRcI2lLig9Q6GauPh8OBA2m4VXGMM15NqtqxRWd zkZtIifVUH9skuXUg4kHFMISjVE77dxh2JECnuLOEVOghD00An1sI46FgoMsygu6 DvWoyzwgWhgxz0U7Fb8WI1yTraXiZP4ozuBl8k0MchclB53vlkek9IxJGFvsTGKX EnnMxVJYL5X/8i7SM68lldQ2f0WttUIIXShdLfBUsgJ/QQvqq8YG4D9/kFKluLI= =JeLz -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/10/2012 10:59 PM, codey koble wrote: To anyone who could help possibly: My current setup: 2 Ubuntu 10.04 LTS servers running heartbeat, pacemaker, apache, and mysql Heartbeat and pacemaker are running great for my needs with one exception, currently both nodes are showing mysql as slaves. I have mysql configured in a master/slave setup and that is working great on its own. I noticed when I tried to promote one of the servers that an error occurred stating that the ocf:heartbeat:mysql did not support the feature. I evaluated the script and realized it was an older version and did not contain any of the promote/demote code. I found the newest code for the script in the github repo and replaced the entire mysql file with the new code. Upon doing this it then gave an error stating that the ocf:heartbeat:mysql resource agent was not installed. Could you send the error message more precise? Does the cluster tell you the RA si not installed (check path and file permissions) or does the LRM tell that the RA itself has returned a exit code not installed (this would mean the RA does not find your mysql binaries/config/or whatever)? My question would be is there a simple way to update the script instead of manually replacing it like I did, or is there a way to get the code I changed to working? Thanks in advance for any help! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQyH9xAAoJEJ1uHhrzMvZRjW4H/RUxkgL/nXyKZqz6xl8dDn3P bPcCqqOvSX2x32umwkEaS2JZ7Gabo8O7sHIZNC/HcrmDttoRo6L4BNR+W2QkQtMV FEuTVqktOq6WdeaZ2Hn66S42+IkzHOOJRRJzp0GSLfdlxzRiM2E+an/QmPwWbpZZ EFvZbyDScqrKyQo7vN5CE0K1yb9JCrOxLMO2NX1D2reiOv7f3pvslKO03eohLcy/ k4ZagdO9GvIPs7PPj+pI5aUYbH7ypejPR+z8e6OXpAgbfSQg7AJuTgllMcCsODAe BEb78ZWpa4pANAugRvJZ87A1ATjgJy2MBubyewqGRqghnNeqAjq5hPgzH9cuWoQ= =OfyW -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] master/slave drbd resource STILL will not failover
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/04/2012 08:34 PM, Lars Marowsky-Bree wrote: On 2012-12-04T20:38:54, Fabian Herschel fabian.hersc...@arcor.de wrote: Specifying target-role=Master is completely different from specifying a role=Master/Slave on an operation. The former defines that you want the cluster to promote the resource to Master (setting it to slave would prevent the resource from reaching master state, just like stopped would prevent it from being started at all). The latter defines that you want to run different monitor operations per role. Yes, you are right :) I mixed the error case does not promote with does not detect resource failures after promoted. Regards, Lars -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQv4L7AAoJEJ1uHhrzMvZRogUH/iFwh9H6LDZrOLliyUbS0vhS 4PexnVGl2ruo2Va6rnK+ZLUyoQvdCLEM6wDR6wtaA4ZpnxHYIfJi1ZgS/iaFFf/3 a2oqEUo5WFo0p/K94oBfjDIcYjzE+3xuCXfYKujRISiUPf6njX8sQEqEcS1GOfxR PCjH8XNLEvjs/J0g1Y8ATle5TZvLXAy0eTud18xeOlL1AahraU9g1QTDhgO3R4B2 PXfTMrAObZRmyC8HdKItq5OPX0/SfTXtP4vD2d7sfBw7XGgdwGS28zqgwu4V6OdM eQ6BA9RjdAe/NKDPlOwc33oYzAlyNWftYK2VNObxrf77U0ms59jGA2iX8jaF/sQ= =wNpP -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] master/slave drbd resource STILL will not failover
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/29/2012 10:14 PM, Robinson, Eric wrote: Bump... does anyone have some insight on this? Google is not turning up anything useful. Our newest cluster will not failover master/slave drbd resources. It works fine manually using drbdadm from a shell prompt, but when we try it using 'crm node standby' and letting the cluster manage the resource, crm_mon just keeps saying the resource FAILED. We see a lot of these messages in the corosync.log file: drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 DEBUG: ha02_mysql: Calling drbdadm -c /etc/drbd.conf primary ha02_mysql drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 ERROR: ha02_mysql: Called drbdadm -c /etc/drbd.conf primary ha02_mysql drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 ERROR: ha02_mysql: Exit code 11 There is no indication of what may be causing the 'Exit code 11' Here is a link to the corosync log, taken from the standby server (ha09a) where we are trying to fail the resource to... www.psmnv.com/downloads/corosync1.loghttp://www.psmnv.com/downloads/corosync1.log Here is what I have installed... corosync-1.4.1-7.el6_3.1.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 pacemaker-1.1.8-4.el6.x86_64 pacemaker-cli-1.1.8-4.el6.x86_64 pacemaker-cluster-libs-1.1.8-4.el6.x86_64 pacemaker-libs-1.1.8-4.el6.x86_64 Following is my crm config. It's pretty basic. node ha09a \ attributes standby=off node ha09b \ attributes standby=off primitive p_drbd0 ocf:linbit:drbd \ params drbd_resource=ha01_mysql \ op monitor interval=60s primitive p_drbd1 ocf:linbit:drbd \ params drbd_resource=ha02_mysql \ op monitor interval=45s primitive p_vip_clust08 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.210 cidr_netmask=32 \ op monitor interval=30s primitive p_vip_clust09 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.211 cidr_netmask=32 \ op monitor interval=30s ms ms_drbd0 p_drbd0 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Master ms ms_drbd1 p_drbd1 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Master property $id=cib-bootstrap-options \ dc-version=1.1.8-4.el6-394e906 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1352846885 rsc_defaults $id=rsc-options \ resource-stickiness=100 I am not sure if that will really help you - but in my cluster (ok older pacemaker version) I ahve the following to define a master slave resource: primitive rsc_sap_HA0_ASCS00 ocf:heartbeat:SAPInstance \ operations $id=rsc_sap_HA0_ASCS00-operations \ op monitor interval=11 role=Slave timeout=60 \ op monitor interval=13 role=Master timeout=60 \ params \ InstanceName=HA0_ASCS00_sapha0as \ START_PROFILE=/usr/sap/HA0/SYS/profile/HA0_ASCS00_sapha0as \ ERS_InstanceName=HA0_ERS10_sapha0er ERS_START_PROFILE=/usr/sap/HA0/SYS/profile/HA0_ERS10_sapha0er ms msl_sap_enqrepl_HA0 rsc_sap_HA0_ASCS00 \ meta clone-max=2 target-role=Started master-max=1 \ is-managed=true So I have a defined operation role=Master on the primitive but NOT a targe-role=Master on the Master/Slave. Additionally I have a colocation constraint between primitives/group which must run together with the promoted clone: colocation col_grp_sap_as_HAO_msl_sap_enqrepl_HA0_MASTER inf: \ grp_sap_as_HA0 msl_sap_enqrepl_HA0:Master Sorry - I did not have checked, if the syntax has changed here, or if your syntax where valid also in the past - so it might be that my hint is completely useless ;-) I just wanted to point on a thing where your config is completely different to my config. Hopefully ma hint helps... Fabian -- Eric Robinson Disclaimer - November 29, 2012 This email and any files transmitted with it are confidential and intended solely for General Linux-HA mailing list. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. This disclaimer was added by Policy Patrol: http://www.policypatrol.com/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQvlFOAAoJEJ1uHhrzMvZRcj8IAIrNf4T4dFvzblLnkHSSUHvN
Re: [Linux-HA] Pacemaker master/slave - how not to autostart slave after migration of a master or failure of a slave?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Rafal, placing a new master on the right (not restarted) side is typically done by the crm_master calls. You might check the scoring if the resources after you have killed one side and check it with ptest -Ls (or an matching other call without ptest - sorry I do not remember the other comamnd). On SLES pstest -Ls will show you the scores in the Live situation and if crm_master is used it also will show you promote-scores. In my resourceagents the tomcat RA does not contain a crm_master call, so this might be the cause. Best regards Fabian On 11/26/2012 01:39 AM, Andrew Beekhof wrote: On Fri, Nov 23, 2012 at 3:08 AM, Rafał Radecki radecki.ra...@gmail.com wrote: Hi all. I am currently making a Pacemaker/Corosync cluster which serves Tomcat resource in master/slave mode. This Tomcat serves Solr java application. My configuration is: node storage1 node storage2 primitive TSVIP ocf:heartbeat:IPaddr2 \ params ip=192.168.100.204 cidr_netmask=32 nic=eth0 \ op monitor interval=30s primitive TomcatSolr ocf:polskapresse:tomcat6 \ op start interval=0 timeout=60 on-fail=stop \ op stop interval=0 timeout=60 on-fail=stop \ op monitor interval=31 role=Slave timeout=60 on-fail=stop \ op monitor interval=30 role=Master timeout=60 on-fail=stop ms TomcatSolrClone TomcatSolr \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=false globally-unique=true ordered=false target-role=Master colocation TomcatSolrClone_with_TSVIP inf: TomcatSolrClone:Master TSVIP:Started order TomcatSolrClone_after_TSVIP inf: TSVIP:start TomcatSolrClone:promote property $id=cib-bootstrap-options \ dc-version=1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 \ cluster-infrastructure=openais \ expected-quorum-votes=4 \ stonith-enabled=false \ no-quorum-policy=ignore \ symmetric-cluster=true \ default-resource-stickiness=1 \ last-lrm-refresh=1353594420 rsc_defaults $id=rsc-options \ resource-stickiness=10 \ migration-threshold=100 So logically I have: - one node with TSVIP and TomcatSolrClone Master; - one node with TomcatSolrClone Slave. I have set up replication beetwen Solr on TomcatSolrClone Master and Slave and written an ocf agent (attached). Few moments ago when I killed the Slave resource with 'pkill java' the resource was restarted on the same node despite the fact that the monitor action returned $OCF_ERROR_GENERIC and I have on-fail=stop for TomcatSolr set (I have also tried block with same effect). Then I have added a migration threshold: ms TomcatSolrClone TomcatSolr \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=false globally-unique=true ordered=false target-role=Started \ params migration-threshold=1 and now when I kill java on Slave it does not start anymore (the Master is ok). But when I then kill java on Master (no resource running on both nodes) everything gets restarted by the cluster and Master and Slave are running afterwards. How to stop this restart when Slave and Master both fail? Could you file a bug (https://bugs.clusterlabs.org) for this and include a crm_report for your testcase? Its likely that you've hit a bug. Best regards, Rafal. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQs3+iAAoJEJ1uHhrzMvZRkJcH/ij5X5NQn5OxBr0ZEGapj7eM oX9BYT16xPs1HJXLMsjbKVmctAsGLJL79j9gnSVWGS7LhTv1XjHQlHHJyA7y+BbG irscHbgMHg/WwreYeoyfcHRQP/o0rODPWEEmGfI8R89hkqCPjayMRw9NJOkZHMMq ED/VtSlZxeB9wKZnWz9bw8XW4hov0wInhdl4hvSrnh2fCCXxatGz+VtwRXvLrOm3 +h5g+nkpn+Q5hAz8xTnn2TMvOAE10SOnWw9XX6vpkgUU61TPTJ9am53x+e4iNURu 7hsUdXWfm3h7+c10BzcrIjVS5GEwu29ZvYmsMiM4LIVXImloFEvmsd5Bpw8yVaw= =Wbeu -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: pcs or crmsh?
On 11/14/2012 03:33 PM, Digimer wrote: Linux in general is all about choice, possibly to a fault. I see no reason why clustering shouldn't be the same. I really like linux and cluster frameworks to spent choice (I was even so near to miss-spell that as joice :) but on the other hand it does not make sense to change things like crm to pcs without having any problems with the integrated, stable, multiple-used, road capable solution we already had. Customers does not really like such changes as it shows that this cluster solution is still teenage and not grown. This is in my point of view a very bad message! Regards Fabian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: pcs or crmsh?
On 11/15/2012 12:03 AM, Andrew Beekhof wrote: I can think of 3 tooling changes: - ptest/crm_simulate - hb_report/crm_report - standalone crmsh Thats not /too/ bad in 4 years. But completely un-needed. Where is the benefit on changing from crm to pcs? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: pcs or crmsh?
On 11/14/2012 11:20 PM, Andrew Beekhof wrote: I sincerely hope SUSE does continue with crmsh but I _like_ that there are people trying something new. Yes I also like things which are going better. But what is the benefit on dropping CRM and introducing PCS to that procject? What is the benefit for all distributions which in the past did not say its only tech prev? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] pcs or crmsh?
On 11/14/2012 05:10 PM, alain.mou...@bull.net wrote: Hi Just for information, I'm using cleanup and crm_mon very very very often with lots of ressources configured in Pacemaker and never had any problem like the problems you describe ... (on RHEL) Alain crm shell and tools like crm_mon are stable on SLES since years! I really like this story to go on and no silly changes which have 0-benefit. Changes should have a real benefit otherwise they just hurt the story of a cluster project. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] string2msg_ll: node [?] failed authentication
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Are there other nodes with the same multicast address? On 08/02/2011 12:38 AM, Hai Tao wrote: I reinstalled the OS for node1 (in a two nodes HA, and the node1 had a disk error), and reconfigured HA. however, after restarting the heartbeat, I see many errors of string2msg_ll: node [?] failed authentication on the node 2. I checked authkeys, and confirmed both nodes have the same setting. Is ther any idea why this happen? Thanks. Hai Tao ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJON+RfAAoJEJ1uHhrzMvZRr1cH/0qj3P1oT9nq+itLqz8u9nPV bHeCjpOCGprM13tVNv0hZhwSxVONdaSfJWZTi3vwaiZORHlxIaXk99S+oRRen99y gncuWFZM753prTAqCqfgp4s3xGqIIktc/pMJTTxLVoQC9pF8M/2G65wYFyBvAjht UaMVkcQY+WgKQdyCD0YVYphkg3GGTlhBBPZzUIPqzFcXW6Ax3Ht5XaT5xc1BlW0z ee2VMy6nTKg4Wog+qpTFcP8Gnose5vSRCTiHsUR1O7Br3+nhoLcpwb+4BtQ6wj+5 4q/2NwXBlaOGEPmmHhXyqdKtgKyeVdLnerAss+YBaVzimukY3H0g6ntHyTmRGa8= =TkPw -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] two nodes insolate
Of course you checked, that firewall is off? Am 06.07.2010 13:01, schrieb Trujillo Carmona, Antonio: I'm try to setup a 2 nodes cluster for HA, after configure it I began to test it but fail. I configured a ping node and it got offline always. I try to configure a ping resource and neither it work. always I got: -- crm(live)# status Last updated: Tue Jul 6 12:48:30 2010 Stack: openais Current DC: balanceador-2 - partition WITHOUT quorum Version: 1.0.8-f2ca9dd92b1d+ sid tip 2 Nodes configured, 2 expected votes 3 Resources configured. Online: [ balanceador-2 ] OFFLINE: [ balanceador-1 ] control-aislamiento (ocf::pacemaker:ping): Started balanceador-2 crm(live)# -- My configuration is: crm(live)# configure crm(live)configure# show node $id=10.104.24.204 hvn21:ping \ attributes standby=false node balanceador-1 node balanceador-2 primitive control-aislamiento ocf:pacemaker:ping \ meta target-role=Started \ operations $id=control-aislamiento-operations \ op monitor interval=10 timeout=60 \ params host_list=hvn21 balanceador-1 balanceador-2 primitive control-haproxy lsb:haproxy \ meta target-role=Started is-managed= true \ operations $id=control-haproxy-operations \ op monitor interval=15 timeout=15 start-delay=15 primitive control-ip ocf:heartbeat:IPaddr2 \ meta target-role=started \ operations $id=control-ip-operations \ op monitor interval=10s timeout=2 0s \ params ip=10.104.16.234 lvs_support= 31mtrue unique_clone_address=true location ip-en-balanceador-1 control-ip inf: bal anceador-1 colocation weblogic inf: control-ip control- haproxy order haproxy-primero : control-haproxy cont rol-ip property $id=cib-bootstrap-options \ dc-version=1.0.8-f2ca9dd92b1d+ sid tip \ cluster-infrastructure=openais \ stonith-enabled=false \ last-lrm-refresh=1278331717 \ expected-quorum-votes=2 \ no-quorum-policy=suicide crm(live)configure# Thank for your time ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Setting up HA-Cluster with heartbeat and SAP
Do you still really need heartbeat v1??? There are some advanced SAP Resource-Agents for heartbeat v2, which also include monitoring and service restarts. The problem with your own(?) RA is it could damage your data, if the unmounts are not work properly (failing due to open files). This could cause dual mounted file systems - ugly! Best regards Fabian Herschel Andreas Reschke schrieb: Hi, i need to set up a HA-Cluster for a SAP-application. Requirements: - 2 IBM x3650 - SLES 10 SP2 x86_64 - SAN (EMC) Steps: 1. installing SLES on the server 2. configure heartbeat (v1) - /etc/ha.d/ha.cf: logfile /var/log/ha-log keepalive 2 deadtime 30 warntime 10 initdead 120 auto_failback off bcast eth2 bcast eth3 ucast eth2 11.0.0.1 ucast eth3 11.0.0.2 ucast eth2 11.0.0.3 ucast eth3 11.0.0.4 nodebgstsapgtsls1 nodebgstsapgtsls2 ping 10.20.94.1 keepalive 10 - /etc/ha.d/haresources: bgstsapgtsls1 10.20.94.200/32/255.255.255.255/bond0:1 sap - /etc/ha.d/resource.d/sap: # Author: Andreas Reschke andreas.resc...@behrgroup.com # License: GNU General Public License (GPL) # Date: 2009-03-16 # #set -x # See how we were called. case $1 in start) # SAP-Startscript # mount SAN # LVM-Volumes search and activate /etc/init.d/boot.md start /etc/init.d/mdadmd start /etc/init.d/boot.lvm start # setting hostname hostname bgstsapgpls01 # filesystem mount # all filesystems (sap_vg) are on the SAN mount /dev/sap_vg/lv20 /sap/btpadm mount /dev/sap_vg/lv19 /oracle mount /dev/sap_vg/lv1 /oracle/BTP mount /dev/sap_vg/lv2 /oracle/BTP/mirrlogA mount /dev/sap_vg/lv3 /oracle/BTP/mirrlogB mount /dev/sap_vg/lv4 /oracle/BTP/oraarch mount /dev/sap_vg/lv5 /oracle/BTP/origlogA mount /dev/sap_vg/lv6 /oracle/BTP/origlogB mount /dev/sap_vg/lv7 /oracle/BTP/saparch mount /dev/sap_vg/lv8 /oracle/BTP/sapbackup mount /dev/sap_vg/lv9 /oracle/BTP/sapcntrl1 mount /dev/sap_vg/lv10 /oracle/BTP/sapcntrl2 mount /dev/sap_vg/lv11 /oracle/BTP/sapcntrl3 mount /dev/sap_vg/lv12 /oracle/BTP/sapdata1 mount /dev/sap_vg/lv13 /oracle/BTP/sapdata2 mount /dev/sap_vg/lv14 /oracle/BTP/sapdata3 mount /dev/sap_vg/lv15 /oracle/BTP/sapdata4 mount /dev/sap_vg/lv16 /oracle/BTP/sapreorg mount /dev/sap_vg/lv17 /sapmnt/BTP mount /dev/sap_vg/lv18 /usr/sap/BTP # SAP start su - orabtp -c /oracle/BTP/102_64/bin/lsnrctl start # wait for listener sleep 10 su - btpadm -c /usr/sap/BTP/SYS/exe/run/startsap # Backupdaemon start /etc/init.d/adsm start ;; stop) # SAP-Stopscript su - btpadm -c /usr/sap/BTP/SYS/exe/run/stopsap su - btpadm -c /usr/sap/BTP/SYS/exe/run/saposcol -kc su - btpadm -c /usr/sap/BTP/SYS/exe/run/cleanipc 41 remove su - orabtp -c /oracle/BTP/102_64/bin/lsnrctl stop # wait for all stopping process sleep 10 # if necessary killall sapstartsrv # umount SAN umount /oracle/BTP/mirrlogA umount /oracle/BTP/mirrlogB umount /oracle/BTP/oraarch umount /oracle/BTP/origlogA umount /oracle/BTP/origlogB umount /oracle/BTP/saparch umount /oracle/BTP/sapbackup umount /oracle/BTP/sapcntrl1 umount /oracle/BTP/sapcntrl2 umount /oracle/BTP/sapcntrl3 umount /oracle/BTP/sapdata1 umount /oracle/BTP/sapdata2 umount /oracle/BTP/sapdata3 umount /oracle/BTP/sapdata4 umount /oracle/BTP/sapreorg umount /oracle/BTP umount /oracle umount /sapmnt/BTP umount /usr/sap/BTP umount /sap/btpadm # setting old hostname hostname bgstsapgtsls1 # Backupdaemon stop /etc/init.d/adsm stop ;; restart|reload) $0 stop $0 start ;; *) echo Usage: sap {start|stop|restart} exit 1 esac exit $RETVAL Questions: Does this work? Can I have problems with configuration? Does anybody a similar configuration? Gruß Andreas Reschke BG-IM173 Unix/Linux-Administration Behr GmbH Co. KG ST B29, 3.OG Tel.: +49 711 896-4598 Fax: ++49 711-8902-4598 Mobil: 0173-3197397 andreas.resc...@behrgroup.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Re: Configurating STONITH device (how to avoid reset each other)
Hi, for heatbeat 2.1.4 there is IMHO no out-of-the-box solution for that problem. I dont know, if the following method would be a valid method: Edit(!) the stonith script and add a sleep XX to the one of the nodes stonith script. This would cause one of the script to hang for some seconds. In consequence the resulting stonith actiond should not appear at the same time. Hopefully this does not work against haertbeat internal sleeps (did not test that so far). This method also causes, that teh cluster takeover action will run some seconds later (on the changed node), because the stonith action has to be fullfilled before other actions could be processed. @List: Would that be a valid work-arround? Regards Fabian Herschel linux-ha-requ...@lists.linux-ha.org schrieb: Betreff: [Linux-HA] Configurating STONITH device (how to avoid reset each other) Von: Alessandra Giovanardi a.giovana...@cineca.it Datum: Wed, 4 Mar 2009 18:07:32 +0100 (MET) An: linux-ha@lists.linux-ha.org An: linux-ha@lists.linux-ha.org Hi, I'm using heartbeat on a cluster of 2 nodes and stonith to avoid split brain with external/ipmi: heartbeat-stonith-2.1.4-0.11 heartbeat-2.1.4-0.11 I'm using heartbeat with crm off (version 1-like). I've a question: If the nodes turn unavailable *each* *other*, how can avoid that node-1 RESETS node-2 and node-2 RESETS node-1 at same time? Which is the same question of this post: http://www.nabble.com/Configurating-STONITH-device-(reset-each-other)-td21672102.html where the answer: No, but it is extremely unlikely for this to happen. is for me not so exhaustive... Someone has solved this problem or evalutated the occurrence of this event? Thanks A. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] RE: Help with STONITH Plugin
You can either query the whole cluster resource definition: cibadmin -Q od you can query the definition of a single resource (primitive/group/clone): crm_resource -l This gives a list of defined resources crm_resource -r ONE-OF-YOUR-RESOURCES -x queries the xml-definition of your resource. Von: Gruher, Joseph R joseph.r.gru...@intel.com Datum: Fri, 6 Feb 2009 15:32:43 -0800 An: General Linux-HA mailing list linux-ha@lists.linux-ha.org An: General Linux-HA mailing list linux-ha@lists.linux-ha.org CC: Liu, Zheng-yang zheng-yang@intel.com Can the resource definition be captured or exported? Would that be part of the plugin script itself? I can send any useful debug information that can be captured from the system if you can provide some guidance on what would be helpful. Thanks, Joe -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Fabian Herschel Sent: Friday, February 06, 2009 11:03 AM To: Linux-HA Subject: Re: [Linux-HA] RE: Help with STONITH Plugin Thanks for the input. What could cause the STONITH request to not be sent from tengine? Do you have defined FENCE as a reaction in one of your resource operations? Without the resource definition its not easy to tell, why fencing is not started. Thanks, Joe ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Q: Known problems/limitations with quorum server?
Hi all, my question is: are there any know problems/limitations with quorum server and heartbeat 2.1.13 (or 2.0.8)? I would need the quorum server for a split-site (streched) 4 node cluster (2 nodes on each side). Best regards Fabian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources
Hi, please try either lower-cased host/node names or use the patch I sent yesterday. The problem is that heartbeat uses the lower-cased hostnames as nodenames and membership list in the CCM. EVMS compares case-sensitive. This means evms says your cluster node acquiring the private container is not allowed to do it, as the CCM has not the exact node name in its list. In you case the node names (CZVLabNode2) is not lower cased this is the cause of the problem. Either change both nodes to lower cased (uname -n must report correctly, hostname also), or apply the patch. After that you should use the following procedure to come out of the stored failuers of the evms_failover resource: 1. Cleanup the resource 2. Stop(!) the resource 3. Start the resource If the resource belongs to a group finaly delete the target_role of the evms_failover resource. Now everything should work fine. Best regards Fabian Am Freitag, den 23.11.2007, 11:16 +0100 schrieb Chris: Hi Yan, Thanks a lot for your help. I took out the evmsSCC resource from the scenario, but I did not see any difference in the system behavior, then I followed your suggestion and I manually tested the EVMS commands from the CLI while both the nodes where in stand-by, and I actually realized that the command: modify: gwcont,type=private,node=CZVLabNode2 was failing; was somehow not recognized as a valid command. The really weird thing is that the same command, avoiding the capital letters in the host name, was successful: modify: gwcont,type=private,node=czvlabnode2 This was true in both nodes, so I modified both the hostnames from: CZVLabNode1 -- czvlabnode1 CZVLabNode2 -- czvlabnode2 and now the fail over is working properly. like everything else. The reason why I tried to change the host names so to avoid any capital letter is that I noticed that, even if my host names were a mixture of normal and capital letters, in the hb_gui they were shown without capitals. As soon as I will have time for this, I will do some further test to verify if I can duplicate this again starting from scratch, so to verify if Heartbeat 2.1.2 and/or EVMS 2.5.5.-24.52 really have some issues with node names partially capitalized, I will update the list afterwards. Could also be that I modified something else in the system that I'm not fully aware of, or I simply or forgot it, as I did many different test on the same boxes. Thanks again, Chris On Nov 21, 2007 9:23 PM, Yan Fitterer [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On Nov 21, 2007, at 10:11 AM, Christian Zemella wrote: Hi All, Anybody out there managed to have EVMS container resources properly failing over in a 2 node Heartbeat 2 cluster running on SLES 10 SP1 ? I believe so... have you read the documentation below? http://wiki.novell.com/images/3/37/Exploring_HASF.pdf In my lab I can only start and stop the resource on the node that has the container assigned within evms, while if I shut down that node, the fail over does not occur as the evms_failover resource goes in time out; as soon as the other nodes comes up again it takes the resource back properly. This would indicate that evms_failover RA cannot assign the container to the new node. Do you see the resource failing? Have you checked failcount for the resources on that node? Some clues (from evms perspective): take a look in /dev/evms/.nodes When the private container is present on the node, a device file named after the container should appear there. TO test manually, the easiest is to start HB, then put both nodes on standby, then manipulate the evms devices manually. To deport the container (on resource stop) evms_failover issues commands to the evms command line tool: modify:$1,type=deported save exit where $1 is the value of the 1 parameter you've passed to evms_failover. You can try this yourself manually, to verify where the issue is (i.e. with evms or elsewhere). To import the container (when starting the resource), evms_failover does: modify:$1,node=$HOSTNAME,type=private save exit In my environment I created the following: I'm working using 2 VMWare boxes sharing one 4GB plain disk that works as SAN; EVMS: I created a private container (gwcont) on the shared disk using CSM plug-in and in it an EVMS Volume (gwvol); on the volume i make a reiserfs file system; I verified that the HA plug-in was working and that the node assigned to the container can manually mount it. HB_GUI: I created a group ordered and collocated; Inside the group i created the following resources: - evmsSCC -- no No attributes, No Parameters; - evms_failover -- Parameter: 1 Value: gwcont (name of the EVMS container ) - Filesystem -- Parameter: fstype Value: reiserfs; Parameter: device Value:
Re: [Linux-HA] evms-failover resource agent does not handle case sensitive hostnames correctly
Am Freitag, den 30.11.2007, 12:19 + schrieb Yan Fitterer: First question: Are you interested in my patch (just 2-3 lines)? Most likely ;) Although I'm not completely sure how case is handled elsewhere. We might be case-sensitive on purpose! (although I can't see a good reason to do this for host names). Heartbeat handels case-sentive hostnames by lowercasing them. Thus the CCM only lists lowercased hostnames (see crm_mon, gui and others). But the nodes in ha.cf should written like uname -n responds (which is in oroginal letters thus with case. The problem using the evms-failover resource agent is that the hostname is given to evms in original (case sensitive) letters, evms check the string case sensitive (here is the original error I guess) against the CCM entries and claims, the parameter is illeal (evms means the hostname is not member of teh cluster, wile heartbeat says it is member of the cluster (just different string comparing). My patch just ignores upper/lowercases by lowercasing the local hostname. This seams to be compatible with the way heartbeat is doing it. And it is (fow now) much more easy to handle (for me) than to change the evms behaviour. Hope the patch is helpful Fabian Maybe send to the -dev list? Sorry I am not assigned to the dev list. So first I send the (very small) patch here. See attachment (only three line patch). Second: Any idea why I am not able to migrate the private container? Are there any typical pitfalls? The resource cannot run anywhere is nothing to do with the resource agent. It's the PE (Policy Engine) deciding that it is so. Likely you have either resource node failcounts that are too high, or failed starts. Resources that have failed to start on a node are not eligible to be started again on that node (at least on the SLES 10 2.0.8 version). I've heard this may change one day. To see failed starts, try crm_verify -VV To see failcounts, I usually grep the output of cibadmin -Q (much faster than issuing multiple crm_failcount commands...). Yan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- SUSE LINUX GmbH, Maxfeldstr. 5, D - 90409 Nürnberg Phone: +49 (0)69 - 2174-1923 FaxFFM: +49 (0)69 - 2174-1740 FaxDUS: +49 (0)211 - 5631-3769 e-mail: [EMAIL PROTECTED] - SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg) - PLEASE NOTE: This e-mail may contain confidential and privileged material for the sole use of the intended recipient. Any review, distribution or other use by anyone else is strictly prohibited. If you are not an intended recipient, please contact the sender and delete all copies. Thank you. 60d58 HN=$(echo $HOSTNAME|tr [:upper:] [:lower:]) 62c60 modify:$1,node=$HN,type=private --- modify:$1,node=$HOSTNAME,type=private ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] evms-failover resource agent does not handle case sensitive hostnames correctly
Hi, I have searched for some times, why my defined evms-failover resource does not work on any of my heartbeat 2.0.8 nodes (SLES10SP1, x86_64). I want to use a private evms container to avoid the (noc cluster) file system to be mounted twice by administrative error. But the resource was not started on any node. So I checked, what the re source agent have to do to start an private container resource and tried to do that by hand using the CLI. The CLI everytime told me a parameter was wrong (but not any useful information which parameter). Then I tried to aquire the private container but used the lower cased hostname (not the hostname sean running uname -n) and it worked. I wrote a small patch for that resource agent and than teh resource could be started on one cluster side but could not be migrated. The cluster says the resource xxx could not run everywhere. First question: Are you interested in my patch (just 2-3 lines)? Second: Any idea why I am not able to migrate the private container? Are there any typical pitfalls? Best regards Fabian -- SUSE LINUX GmbH, Maxfeldstr. 5, D - 90409 Nürnberg Phone: +49 (0)69 - 2174-1923 FaxFFM: +49 (0)69 - 2174-1740 FaxDUS: +49 (0)211 - 5631-3769 e-mail: [EMAIL PROTECTED] - SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg) - PLEASE NOTE: This e-mail may contain confidential and privileged material for the sole use of the intended recipient. Any review, distribution or other use by anyone else is strictly prohibited. If you are not an intended recipient, please contact the sender and delete all copies. Thank you. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource fencing
Junko IKEDA schrieb: Is there any disk reservation strategy implemented in heartbeat and its agents (did not found any). i think someone from NTT posted a resource agent that did this Hi Fabian, NTT's RA is not a scsi reservation to be exact, but try the attached if you don't mind. We've upgraded it just a bit. Thanks a lot for providing this agent! Please let me know if there are any troubles when you set it up. Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- SUSE LINUX GmbH, Maxfeldstr. 5, D - 90409 Nürnberg Phone: +49 (0)69 - 2174-1923 FaxFFM: +49 (0)69 - 2174-1740 FaxDUS: +49 (0)211 - 5631-3769 e-mail: [EMAIL PROTECTED] - SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg) - PLEASE NOTE: This e-mail may contain confidential and privileged material for the sole use of the intended recipient. Any review, distribution or other use by anyone else is strictly prohibited. If you are not an intended recipient, please contact the sender and delete all copies. Thank you. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] pingd, quorum, split-brain... should I give up?
Riccardo Perni schrieb: Andrew Beekhof [EMAIL PROTECTED] ha scritto: On 10/23/07, Riccardo Perni [EMAIL PROTECTED] wrote: Andrew Beekhof [EMAIL PROTECTED] ha scritto: On 10/22/07, Riccardo Perni [EMAIL PROTECTED] wrote: Is it possible to handle this situation? You may try quorumd. See http://www.linux-ha.org/QuorumServerGuide I'm going to look at it, but is'n it another SPOF? by definition, no. because you've already had at least one failure before quorumd becomes relevant Do you mean that the cluster will continue to work even if I have a failure on the quorum server? my understanding is that the quorum server is not used unless you already dont have quorum... at which point you've lost half your nodes anyway Uhm, but at this point I already have a split-brain condition... or not? No split brain means you have (at least) two cluster sides which both means to be THE cluster. The quorum server helps here. Only one side of the cluster gets the quorum. --Riccardo Perni Unità Operativa Informatica Aziendale ASL Roma-B This message was sent using IMP, the Internet Messaging Program. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- SUSE LINUX GmbH, Maxfeldstr. 5, D - 90409 Nürnberg Phone: +49 (0)69 - 2174-1923 FaxFFM: +49 (0)69 - 2174-1740 FaxDUS: +49 (0)211 - 5631-3769 e-mail: [EMAIL PROTECTED] - SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg) - PLEASE NOTE: This e-mail may contain confidential and privileged material for the sole use of the intended recipient. Any review, distribution or other use by anyone else is strictly prohibited. If you are not an intended recipient, please contact the sender and delete all copies. Thank you. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Resource fencing
In the wiki I found the keyword resource fencing and also disk reservation in clusters like the symantec (veritas) hasf they have implemented disk rervations. Disk reservations can be implemneted by a specia SCSI-3 command sequence. Is there any disk reservation strategy implemented in heartbeat and its agents (did not found any). Regards Fabian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems