Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE

2015-01-15 Thread Fabian Herschel

Hi Muhammad,

please ask the SAP Guys what they have changed. Did they install the 
sapstartsrv or saphostagent?


Threads should not stop with its working now, but I could not explain 
what we have changed :) This stops others to learn from this situation.


Regards
Fabian


On 01/15/2015 04:12 PM, Muhammad Sharfuddin wrote:

Thanks for your excellent help, appreciated.

I dont know what happened exactly, seems like SAP Guys has fixed the
issue as now cluster start running the SAPInstance without any issue.
Also find below the sapcontrol output

thltlp2:tlpadm 48 /usr/sap/TLP/DVEBMGS00/exe/sapcontrol -nr 00
-function GetProcessList

15.01.2015 19:40:04
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
disp+work, Dispatcher, GREEN, Running, 2015 01 15 19:27:24, 0:12:40, 10920
igswd_mt, IGS Watchdog, GREEN, Running, 2015 01 15 19:27:24, 0:12:40, 10921
gwrd, Gateway, GREEN, Running, 2015 01 15 19:27:25, 0:12:39, 10938
icman, ICM, GREEN, Running, 2015 01 15 19:27:25, 0:12:39, 10939
thltlp2:tlpadm 49


Thanks once again.

Regards,

Muhammad Sharfuddin
Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK
http://www.nds.com.pk

On 01/15/2015 03:57 PM, Fabian Herschel wrote:

Hi Muhammad,

please retry the command as user sidadm. Or inspect the resource
agent for ALL environment variables to be set, not only LD_LIBRARY_PATH

If sapcontrol would be disfunctional using sidadm you have a SAP
problem and that could not be disussed here.

Regards
Fabian

On 01/15/2015 11:49 AM, Muhammad Sharfuddin wrote:

thltlp1:~ # echo $LD_LIBRARY_PATH
/usr/sap/TLP/ASCS01/exe/:/usr/sap/TLP/DVEBMGS00/exe:/usr/lib64
thltlp1:~ # /usr/sap/TLP/DVEBMGS00/exe/sapcontrol -nr 00 -function Start
Could not open the ICU common library.
The following files must be in the path described by
the environment variable LD_LIBRARY_PATH:
libicuuc.so.50, libicudata.so.50, libicui18n.so.50
[/bas/741_REL/src/flat/nlsui0.c 1535] pid = 27543
LD_LIBRARY_PATH is currently set to not set
[/bas/741_REL/src/flat/nlsui0.c 1538] pid = 27543
thltlp1:~ #

please help


Regards,

Muhammad Sharfuddin
Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK
http://www.nds.com.pk

On 01/15/2015 02:15 PM, Fabian Herschel wrote:

On 01/14/2015 10:53 PM, Muhammad Sharfuddin wrote:

On 01/15/2015 02:35 AM, Fabian Herschel wrote:
  Hi Muhammed,
 
  sorry please do NOT use startsap. Please use sapctrl.
  sapctrl -nr 00 -function Start
  Check the started processes using
  sapctrl -nr 00 -function GetProcessList
 
I dont find the sapctrl command available on the system.


Sorry the command is sapcontrol (I abbreviated the control to ctrl)
From the SAPInstance resource agent:
SAPCONTROL=/usr/sap/$SID/$InstanceName/exe/sapcontrol



 
  If disp+work processes are not starting than you might need to
check
the reason in the work directory of the SAP NetWaver instance.
 
Thanks for the pointer, I'll get this check with SAP Guys

  Regards
  Fabian
 
 


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems










___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: SAPInstance does not start and asking for START_PROFILE

2015-01-15 Thread Fabian Herschel
No only in very old versions there where two different profiles. One START 
profile and one instance profile. Today they are combined in the instance 
profile. The parameter name however could not be changed without hurding all 
installations outside.
So its ok with a current NetWeaver, if the startprofile param points to an 
instance profile name.

However if this is a very old SAP NetWeaver than your idea the right pointer. 
This is also why I asked for the SAP kernel version.
Regards
Fabian




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
Datum:15.01.2015  08:51  (GMT+01:00) 
An: General Linux-HA mailing list linux-ha@lists.linux-ha.org 
Betreff: [Linux-HA] Antw: SAPInstance does not start and asking for
  START_PROFILE 

Don't SAP start profiles start with START_*`?

 Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am 14.01.2015 um 
 20:12 in
Nachricht 54b6bfb2.50...@nds.com.pk:
 OS: SLES 11 SP 3
 pacemaker-1.1.9-0.19.102
 corosync-1.4.5-0.18.15
 resource-agents-3.9.5-0.32.22
 
 starting the SAP Instance resource fails with following errors:
 
 Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR: 
 Expected
 TLP_DVEBMGS00_thltlp to be the instance START profile, please set 
 START_PROFILE
 parameter!
 Jan 14 18:22:16 thltlp1 crmd[47231]:   notice: process_lrm_event: LRM 
 operation
 SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true) 
 not configured
 
 following is the resource configurations:
 primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
 op monitor interval=120 timeout=60 \
 op start interval=0 timeout=300 \
 op stop interval=0 timeout=300 \
 params InstanceName=TLP_DVEBMGS00_thltlp 
 DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
 START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile
 
 i.e START_PROFILE is configured but cluster is not starting the SAP 
 Instance.
 
 Please help
 
 -- 
 Regards,
 
 Muhammad Sharfuddin
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE

2015-01-15 Thread Fabian Herschel

On 01/14/2015 10:53 PM, Muhammad Sharfuddin wrote:

On 01/15/2015 02:35 AM, Fabian Herschel wrote:
  Hi Muhammed,
 
  sorry please do NOT use startsap. Please use sapctrl.
  sapctrl -nr 00 -function Start
  Check the started processes using
  sapctrl -nr 00 -function GetProcessList
 
I dont find the sapctrl command available on the system.


Sorry the command is sapcontrol (I abbreviated the control to ctrl)
From the SAPInstance resource agent:
SAPCONTROL=/usr/sap/$SID/$InstanceName/exe/sapcontrol



 
  If disp+work processes are not starting than you might need to check
the reason in the work directory of the SAP NetWaver instance.
 
Thanks for the pointer, I'll get this check with SAP Guys

  Regards
  Fabian
 
 


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Antw: SAPInstance does not start and asking for START_PROFILE

2015-01-15 Thread Fabian Herschel

On 01/15/2015 11:35 AM, Ulrich Windl wrote:

Fabian Herschel fabian.hersc...@arcor.de schrieb am 15.01.2015 um 09:05



Hi!

I'm working for over 20 year with SAP R/3.


Me to :) I started with SAP R/3 1.1b and also have seen 1.0 which was 
SAP R/3 try and buy (which I changed in a non marketing try-and-bye :)


 The tendency is that the mess

steadily increases, making things more complex without actually improving the
reliability (MHO). Where previously you used a script to start processes, you
now have some script that acts like a web client to send some request to a
java-based web server that in turn is expected to start the required processes.


My guess is that this is W*d*ws-style so having GUIs which are using 
someting like rest-APIs to provide a system change instead of just 
starting a well-working start script (like sapstart was in the past).



Ive' see cases where a started sucessfully did not mean anything, i.e.
nothing was started. OK, that was off-topic, but I had to say.


Yes that's unfortunately true and that's also why we recommend tom 
monitor the SAP instances to figure out real start error during the next 
monitor cycle.



So regarding compatibility, you must make sure that the configuration files
and related components match hte rest of your SAP infrastructure.

My favorite is this: If you run a java stack that does nothing (is idle), it
takes  6 minutes to shut down the SAP instance (while there is no I/O and no
CPU activity). Only the best programmers (TM) can write such code. I had to
increase timeouts several times to prevent machine fencing while waiting for
the stop command to complete... off-topic again, sorry.

Regards,
Ulrich




However if this is a very old SAP NetWeaver than your idea the right
pointer. This is also why I asked for the SAP kernel version.
Regards
Fabian




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
Datum:15.01.2015  08:51  (GMT+01:00)
An: General Linux-HA mailing list linux-ha@lists.linux-ha.org
Betreff: [Linux-HA] Antw: SAPInstance does not start and asking for
START_PROFILE

Don't SAP start profiles start with START_*`?


Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am 14.01.2015 um

20:12

in
Nachricht 54b6bfb2.50...@nds.com.pk:

OS: SLES 11 SP 3
pacemaker-1.1.9-0.19.102
corosync-1.4.5-0.18.15
resource-agents-3.9.5-0.32.22

starting the SAP Instance resource fails with following errors:

Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR:
Expected
TLP_DVEBMGS00_thltlp to be the instance START profile, please set
START_PROFILE
parameter!
Jan 14 18:22:16 thltlp1 crmd[47231]:   notice: process_lrm_event: LRM
operation
SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true)
not configured

following is the resource configurations:
primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
op monitor interval=120 timeout=60 \
op start interval=0 timeout=300 \
op stop interval=0 timeout=300 \
params InstanceName=TLP_DVEBMGS00_thltlp
DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile

i.e START_PROFILE is configured but cluster is not starting the SAP
Instance.

Please help

--
Regards,

Muhammad Sharfuddin


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems





___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE

2015-01-14 Thread Fabian Herschel

Hi Muhammad,

could you try:
primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
op monitor interval=120 timeout=60 \
op start interval=0 timeout=300 \
op stop interval=0 timeout=300 \
params InstanceName=TLP_DVEBMGS00_thltlp
DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp

Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset, so 
SAPInstance could use the automatical detection.


If you set the param START_PROFILE it must be a full file path NOT 
relative to DIR_PROFILE.


Hope that helps
Best regards
Fabian

On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote:

OS: SLES 11 SP 3
pacemaker-1.1.9-0.19.102
corosync-1.4.5-0.18.15
resource-agents-3.9.5-0.32.22

starting the SAP Instance resource fails with following errors:

Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR:
Expected
TLP_DVEBMGS00_thltlp to be the instance START profile, please set
START_PROFILE
parameter!
Jan 14 18:22:16 thltlp1 crmd[47231]:   notice: process_lrm_event: LRM
operation
SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, confirmed=true)
not configured

following is the resource configurations:
primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
op monitor interval=120 timeout=60 \
op start interval=0 timeout=300 \
op stop interval=0 timeout=300 \
params InstanceName=TLP_DVEBMGS00_thltlp
DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile

i.e START_PROFILE is configured but cluster is not starting the SAP
Instance.

Please help



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE

2015-01-14 Thread Fabian Herschel
Hi Muhammed,

sorry please do NOT use startsap. Please use sapctrl.
sapctrl -nr 00 -function Start
Check the started processes using
sapctrl -nr 00 -function GetProcessList

If disp+work processes are not starting than you might need to check the reason 
in the work directory of the SAP NetWaver instance.  

Regards
Fabian



Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk 
Datum:14.01.2015  21:13  (GMT+01:00) 
An: linux-ha@lists.linux-ha.org 
Betreff: Re: [Linux-HA] SAPInstance does not start and asking for
START_PROFILE 

On 01/15/2015 01:07 AM, Muhammad Sharfuddin wrote:

 On 01/15/2015 12:46 AM, Fabian Herschel wrote:
  Hi Muhammad,
 
  could you try:
  primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
  op monitor interval=120 timeout=60 \
  op start interval=0 timeout=300 \
  op stop interval=0 timeout=300 \
  params InstanceName=TLP_DVEBMGS00_thltlp
  DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
  START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp
 
  Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset, 
so SAPInstance could use the automatical detection.
 
  If you set the param START_PROFILE it must be a full file path NOT 
relative to DIR_PROFILE.
 
  Hope that helps
  Best regards
  Fabian
 
  On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote:
  OS: SLES 11 SP 3
  pacemaker-1.1.9-0.19.102
  corosync-1.4.5-0.18.15
  resource-agents-3.9.5-0.32.22
 
  starting the SAP Instance resource fails with following errors:
 
  Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR:
  Expected
  TLP_DVEBMGS00_thltlp to be the instance START profile, please set
  START_PROFILE
  parameter!
  Jan 14 18:22:16 thltlp1 crmd[47231]:   notice: process_lrm_event: LRM
  operation
  SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66, 
confirmed=true)
  not configured
 
  following is the resource configurations:
  primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
  op monitor interval=120 timeout=60 \
  op start interval=0 timeout=300 \
  op stop interval=0 timeout=300 \
  params InstanceName=TLP_DVEBMGS00_thltlp
  DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
  START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile
 
  i.e START_PROFILE is configured but cluster is not starting the SAP
  Instance.
 
  Please help
 

 provide the full path, and now error changed. It became:

 Jan 15 00:54:04 thltlp2 cibadmin[24511]:   notice: crm_log_args: 
Invoked: cibadmin -p -R -o resources
 Jan 15 00:54:04 thltlp2 SAPInstance(SAPInst-DVEBMGS00)[22589]: ERROR: 
SAP Instance TLP-DVEBMGS00 start
 failed:  15.01.2015 00:54:04 WaitforStarted FAIL: process disp+work 
Dispatcher not running
 Jan 15 00:54:04 thltlp2 crmd[2778]:  warning: do_update_resource: 
Resource SAPInst-DVEBMGS00 no longer
 exists in the lrmd
 Jan 15 00:54:04 thltlp2 crmd[2778]:   notice: process_lrm_event: LRM 
operation SAPInst-DVEBMGS00_start_0
 (call=176, rc=7, cib-update=0, confirmed=true) not running
 Jan 15 00:54:04 thltlp2 crmd[2778]:  warning: decode_transition_key: 
Bad UUID (crm_resource.c) in sscanf
 result (4) for 24450:0:0:crm_resource.c
 Jan 15 00:54:04 thltlp2 crmd[2778]:error: send_msg_via_ipc: 
Unknown Sub-system
 (9f52a2cf-6c1d-453d-bc1b-90322f3147f4)... discarding message


 Regards,

 Muhammad Sharfuddin

also note that I can very easily start the SAP without any issue via 
running following command:
 startsap -i DVEBMGS00 -v thltlp

-- 
Regards,

Muhammad Sharfuddin


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] SAPInstance does not start and asking for START_PROFILE

2015-01-14 Thread Fabian Herschel
Please ask the sap guys which version of sap netweaver and which sap kernel you 
are using.
Regards
Fabian




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk 
Datum:14.01.2015  22:53  (GMT+01:00) 
An: linux-ha@lists.linux-ha.org 
Betreff: Re: [Linux-HA] SAPInstance does not start and asking for
START_PROFILE 

On 01/15/2015 02:35 AM, Fabian Herschel wrote:
 Hi Muhammed,

 sorry please do NOT use startsap. Please use sapctrl.
 sapctrl -nr 00 -function Start
 Check the started processes using
 sapctrl -nr 00 -function GetProcessList

I dont find the sapctrl command available on the system.


 If disp+work processes are not starting than you might need to check 
the reason in the work directory of the SAP NetWaver instance.

Thanks for the pointer, I'll get this check with SAP Guys

 Regards
 Fabian


-- 
Regards,
Muhammad Sharfuddin

  Ursprüngliche Nachricht 
 Von: Muhammad Sharfuddin m.sharfud...@nds.com.pk
 Datum:14.01.2015  21:13  (GMT+01:00)
 An: linux-ha@lists.linux-ha.org
 Betreff: Re: [Linux-HA] SAPInstance does not start and asking for
  START_PROFILE

 On 01/15/2015 01:07 AM, Muhammad Sharfuddin wrote:

 On 01/15/2015 12:46 AM, Fabian Herschel wrote:
 Hi Muhammad,

 could you try:
 primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
 op monitor interval=120 timeout=60 \
 op start interval=0 timeout=300 \
 op stop interval=0 timeout=300 \
 params InstanceName=TLP_DVEBMGS00_thltlp
 DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
 START_PROFILE=/sapmnt/TLP/profile/TLP_DVEBMGS00_thltlp

 Or even let the parameters DIR_EXECUTABLE and START_PROFILE unset,
 so SAPInstance could use the automatical detection.

 If you set the param START_PROFILE it must be a full file path NOT
 relative to DIR_PROFILE.

 Hope that helps
 Best regards
 Fabian

 On 01/14/2015 08:12 PM, Muhammad Sharfuddin wrote:
 OS: SLES 11 SP 3
 pacemaker-1.1.9-0.19.102
 corosync-1.4.5-0.18.15
 resource-agents-3.9.5-0.32.22

 starting the SAP Instance resource fails with following errors:

 Jan 14 18:22:16 thltlp1 SAPInstance(SAPInst-DVEBMGS00)[50450]: ERROR:
 Expected
 TLP_DVEBMGS00_thltlp to be the instance START profile, please set
 START_PROFILE
 parameter!
 Jan 14 18:22:16 thltlp1 crmd[47231]:   notice: process_lrm_event: LRM
 operation
 SAPInst-DVEBMGS00_start_0 (call=81, rc=6, cib-update=66,
 confirmed=true)
 not configured

 following is the resource configurations:
 primitive SAPInst-DVEBMGS00 ocf:heartbeat:SAPInstance \
 op monitor interval=120 timeout=60 \
 op start interval=0 timeout=300 \
 op stop interval=0 timeout=300 \
 params InstanceName=TLP_DVEBMGS00_thltlp
 DIR_EXECUTABLE=/usr/sap/TLP/DVEBMGS00/exe
 START_PROFILE=TLP_DVEBMGS00_thltlp DIR_PROFILE=/sapmnt/TLP/profile

 i.e START_PROFILE is configured but cluster is not starting the SAP
 Instance.

 Please help


 provide the full path, and now error changed. It became:

 Jan 15 00:54:04 thltlp2 cibadmin[24511]:   notice: crm_log_args:
 Invoked: cibadmin -p -R -o resources
 Jan 15 00:54:04 thltlp2 SAPInstance(SAPInst-DVEBMGS00)[22589]: ERROR:
 SAP Instance TLP-DVEBMGS00 start
 failed:  15.01.2015 00:54:04 WaitforStarted FAIL: process disp+work
 Dispatcher not running
 Jan 15 00:54:04 thltlp2 crmd[2778]:  warning: do_update_resource:
 Resource SAPInst-DVEBMGS00 no longer
 exists in the lrmd
 Jan 15 00:54:04 thltlp2 crmd[2778]:   notice: process_lrm_event: LRM
 operation SAPInst-DVEBMGS00_start_0
 (call=176, rc=7, cib-update=0, confirmed=true) not running
 Jan 15 00:54:04 thltlp2 crmd[2778]:  warning: decode_transition_key:
 Bad UUID (crm_resource.c) in sscanf
 result (4) for 24450:0:0:crm_resource.c
 Jan 15 00:54:04 thltlp2 crmd[2778]:error: send_msg_via_ipc:
 Unknown Sub-system
 (9f52a2cf-6c1d-453d-bc1b-90322f3147f4)... discarding message


 Regards,

 Muhammad Sharfuddin

 also note that I can very easily start the SAP without any issue via
 running following command:
  startsap -i DVEBMGS00 -v thltlp


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Virtual address for slave

2014-08-02 Thread Fabian Herschel

Create 2 constraints:

1. Colocation between ip adress (the one for the master) and master status of 
your mastr/slave resource: You need to add the status master (instead of 
start, which is the default) to the constraint.

2. Colocation between ip adress (the one for the slave) and slave status of you 
master/slave resource:. Add also the status slave to the constraint 
definition.

It might you also need to adjust the score of your constraints, depending on 
the exact needs.

Regards
Fabian




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: jarek ja...@poczta.srv.pl 
Datum:01.08.2014  09:39  (GMT+01:00) 
An: linux-ha@lists.linux-ha.org 
Betreff: [Linux-HA] Virtual address for slave 

Hello!

I'd like to have two virtual adresses: vip-master and vip-slave.
vip-master should be bound to master mode, vip-slave should be bound to
slave node.
How can I do it ?

Best regards
Jarek

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to restart cluster ?

2014-06-09 Thread Fabian Herschel
Hi,

do you reboot always both nodes the same time, or do you reboot only one node.
Stopping only the resources during reboot is pretty bad. I would add the 
cluster startscript like /etc/init.d/openais to you start/stop sequence. This 
would also tell the left node about the leaving and joining node properly.

Regards
Fabian




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: jarek ja...@poczta.srv.pl 
Datum:09.06.2014  11:58  (GMT+01:00) 
An: linux-ha@lists.linux-ha.org 
Betreff: [Linux-HA]  How to restart cluster ? 

Hello!

Thank you for the answer, but this answer didn't solve my problem.
I have simple two-node cluster with virtual ip address and Postgres with
streaming replication, created with this tutorial:
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
I have two problems to solve:
1. I need some script, which will restart cluster on user demand. This
script should stop postgres resource on both nodes and next restart them
in that way, that postgres will be work without any additional
operations (like removing lock files, cleaning resources etc). 
2. I have a virtual model of this cluster working under VMWare. VMWare
is restarted from time to time, and I have no control when master or
slave will be restarted. I would like to create script, which will be
called from runlevel 6 and will safely stop postgres resource.
I tried to do it with:

crm configure property stop-all-resources=true

but after reboot I had to remove PGSQL.lock manually, and also master
node has been changed.

Do you have any idea how to do it ?

Taktoshi MATSUO wrote:
Do you use pgsql RA with Master/Slave setting ?
I recommend you to stop slave node's pacemaker at first
because pgsql RA removes PGSQL.lock automatically if the node is
master and there is no slaves.

Stop procedure
  1. stop slave node  - suppose nodeB
  2. stop master node (PGSQL.lock file is removed)  - suppose nodeA

Start procedure
  3. start the nodeA because it has the newest data.
  4. start the nodeB

If PGSQL.lock exists, the data may be inconsistent.
See http://www.slideshare.net/takmatsuo/2012929-pg-study-16012253
(P36, P37)

best regards
Jarek

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How I can create unordered group of resources

2014-05-06 Thread Fabian Herschel

On 05/06/2014 01:08 AM, Andrew Beekhof wrote:


On 5 May 2014, at 10:06 pm, Fabian Herschel fabian.hersc...@arcor.de wrote:


On 05/05/2014 02:36 AM, Andrew Beekhof wrote:


On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote:


I would create the group with the meta attributr for unordered resources.
Meta odered=false


N.  Use a colocation set.


Could you explain your No? Whats wrong in using the unordered 
feature? Why was this meta attribute added to groups, if we shouldn't use it?


Its an abomination that I should never have implemented but now cannot remove.


OK, thanks :) Till this thread I never suggested groups to be configured 
unordered. I do not know why I did break my rule.

I also feeled uncomfortable with the meta attributes...












Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Vladimir Romanov vroma...@gmail.com
Datum:03.05.2014  10:29  (GMT+01:00)
An: linux-ha@lists.linux-ha.org
Betreff: [Linux-HA] How I can create unordered group of resources

Hello!

I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS).
I create master/slave statefull resource. My setup also have many other
resources (IPs, Routes, LSB...). I one of resources is failed  on first
mode I want to move all resources to another node. Now I use group to
create this setup. But when I kill -9 some process all processes listen
below also restarted. That is best practice for this task?

--
Vladimir Romanov
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How I can create unordered group of resources

2014-05-05 Thread Fabian Herschel

On 05/05/2014 02:36 AM, Andrew Beekhof wrote:


On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote:


I would create the group with the meta attributr for unordered resources.
Meta odered=false


N.  Use a colocation set.


Could you explain your No? Whats wrong in using the unordered 
feature? Why was this meta attribute added to groups, if we shouldn't 
use it?









Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Vladimir Romanov vroma...@gmail.com
Datum:03.05.2014  10:29  (GMT+01:00)
An: linux-ha@lists.linux-ha.org
Betreff: [Linux-HA] How I can create unordered group of resources

Hello!

I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS).
I create master/slave statefull resource. My setup also have many other
resources (IPs, Routes, LSB...). I one of resources is failed  on first
mode I want to move all resources to another node. Now I use group to
create this setup. But when I kill -9 some process all processes listen
below also restarted. That is best practice for this task?

--
Vladimir Romanov
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How I can create unordered group of resources

2014-05-04 Thread Fabian Herschel
I would create the group with the meta attributr for unordered resources.
Meta odered=false




Von Samsung-Tablet gesendet

 Ursprüngliche Nachricht 
Von: Vladimir Romanov vroma...@gmail.com 
Datum:03.05.2014  10:29  (GMT+01:00) 
An: linux-ha@lists.linux-ha.org 
Betreff: [Linux-HA] How I can create unordered group of resources 

Hello!

I try create Master/Slave cluster using Pacemaker on Centos 6.5 (CRM+PCS).
I create master/slave statefull resource. My setup also have many other
resources (IPs, Routes, LSB...). I one of resources is failed  on first
mode I want to move all resources to another node. Now I use group to
create this setup. But when I kill -9 some process all processes listen
below also restarted. That is best practice for this task?

-- 
Vladimir Romanov
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] 2 Nodes split brain, distant sites

2014-02-27 Thread Fabian Herschel

Hi,

my first idea would be to fix binnetaddr. It should be the 
networkaddress not the machines network address.


Regards
Fabian

On 02/27/2014 03:42 PM, TRIBOLET Thomas wrote:

Hello,

Before starting, my first language is French so I'll try to do my best to 
explain my problem in English.


1)  The situation :

I have 2 servers on 2 distant site.

I need to run openvpn with the same configuration on the 2 servers.
But it must run only on one server at a time.

I want that it start on the second server when the connection with internet is 
lost on the first node.

I use debian with corosync and pacemaker.

Here is the config :


A) Corosync.conf :
compatibility: whitetank
totem {
 version: 2
 token: 3000
 token_retransmits_before_loss_const: 10
 join: 240
 consensus: 3600
 vsftype: none
 max_messages: 20
 clear_node_high_bit: yes
 secauth: off
 threads: 0
 nodeid: 
 rrp_mode: none
 interface {
 member {
 memberaddr: 172.16.135.9
 }
 member {
 memberaddr: 172.16.64.248
 }
 ringnumber: 0
 bindnetaddr: 172.16.135.9
 mcastport: 5405
 }
 transport: udpu
}
amf {
 mode: disabled
}
service {
 ver:   0
 name:  pacemaker
}
aisexec {
 user:   root
 group:  root
}
logging {
 fileline: off
 to_stderr: yes
 to_logfile: yes
 logfile: /var/log/corosync/corosync.log
 to_syslog: yes
 syslog_facility: daemon
 debug: off
 timestamp: on
 logger_subsys {
 subsys: AMF
 debug: off
 tags: enter|leave|trace1|trace2|trace3|trace4|trace6
 }
}

B)  Pacemaker :
node controle-col
node vpn-air
primitive ClusterMon ocf:pacemaker:ClusterMon \
 params user=root update=30 extra_options=-E 
/root/PacemakerMailScript.sh -h /tmp/ClusterMon.html \
 op monitor on-fail=restart interval=60
primitive openvpn lsb:openvpn \
 op monitor interval=30s
primitive p_ping ocf:pacemaker:ping \
 params host_list=8.8.8.8 4.2.2.2 multiplier=100 dampen=5s \
 op monitor interval=60 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
clone ClusterMon-clone ClusterMon
clone c_ping p_ping
location OpenVpnCluster openvpn \
 rule $id=OpenVpnCluster-rule -inf: not_defined pingd or pingd lte 0
location PrefVpnAir openvpn \
 rule $id=PrefVpnAir-rule 50: #uname eq vpn-air
property $id=cib-bootstrap-options \
 dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 no-quorum-policy=ignore


C)  Running good crm_mon

Last updated: Thu Feb 27 14:54:31 2014
Last change: Wed Jan 15 12:51:35 2014 via crmd on controle-col
Stack: openais
Current DC: controle-col - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
5 Resources configured.


Online: [ vpn-air controle-col ]

Clone Set: c_ping [p_ping]
  Started: [ controle-col vpn-air ]
openvpn (lsb:openvpn):  Started vpn-air
Clone Set: ClusterMon-clone [ClusterMon]
  Started: [ controle-col vpn-air ]


2)  My problem :

When there is a network problem :

Ex :
a) first-node site lost internet connection ( and communication with 
second-node at same time due to vpn on internet connection )
b) cluster stop openvpn on first node and launch it on second due to primitive 
p_ping in config.
c) connection come back on first-node site
d) Problem : first-node and second-node don't bring back cluster, the don't see 
each other and create a cluster on each node - split brain I think.
e) Each node has openvpn running which shouldn't happen


I don't have stonith running because I think without quorum it will be 
problematic
Is there a way to say to corosync to recreate a ring ?

Or have someone another solution ?

Thanks


Tribolet Thomas
ISSeP (Institut Scientifique de Service Public)
th.tribo...@issep.bemailto:th.tribo...@issep.be
+32 (0) 4229 83 46

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Funny messages from crm resource restart (SLES11 SP2 vs. SP3)

2013-12-23 Thread Fabian Herschel
Hi,

Did you run the ha update like a rolling update, so one node with current 
version online, the other down. Than updating the offline node, reentering the 
cluster again? In this case I would think the cluster is ok but still only 
supports the old options. Its different from the situation'when you would 
restart both nodes with only one system updated. If I got it corredctly only 
the first method is recommended.

Besst regards
Fabian

Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb:

Hi!

when trying to restart a Xen VM after installing updates in the gust I see 
some funny messages (one node is at SLES11 SP2, while the other node is at 
SLES11 SP3):

First attempt:
h05:~ # crm resource restart prm_xen_v04
INFO: ordering prm_xen_v04 to stop
No messages received in 30 seconds.. aborting
WARNING: crmadmin -S h01 unexpected output:  (exit code: 253)
h05:~ # crmadmin -S 01
Status of crmd@h01: S_IDLE (ok)

Second attempt:
h05:~ # crm resource restart prm_xen_v04
INFO: ordering prm_xen_v04 to stop
No messages received in 30 seconds.. aborting
WARNING: can't find DC

However both nodes are online:
h05:~ # crm_mon -1Arf
Last updated: Mon Dec 23 11:52:44 2013
Last change: Mon Dec 23 11:49:59 2013 by root via cibadmin on h05
Stack: openais
Current DC: h01 - partition with quorum
Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
2 Nodes configured, 2 expected votes
18 Resources configured.


Online: [ h01 h05 ]
[...]

Is it when running the new crm shell (SP3) for a DC that is still SP2?

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Usage of SAPDatabase resource agent without SAPHostAgent is deprecated

2013-04-30 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Muhammad,

please find my answer below...

On 29.04.2013 17:19, Muhammad Sharfuddin wrote:
 On 04/29/2013 08:04 PM, Lars Marowsky-Bree wrote:
 On 2013-04-26T23:56:04, Muhammad Sharfuddin
 m.sharfud...@nds.com.pk
 wrote:
 
 I just upgraded from SP1(SLES 11/HAE) to SP2, and getting
 following messages when I start the SAPDBInstance resource:
 
 SAPDatabase(SAPDBInstance)[22587]: [22602]: WARNING: Usage of 
 SAPDatabase resource agent without SAPHostAgent is deprecated.
 Please read documentation of SAPDatabase resource agent and
 follow SAP note 1031096 for the installation of SAPHostAgent.
 
 there is a chapter HA OCF Agents in the High Availability
 Guide of SP2, but I found nothing new/peculiar about
 SAPDBInstance there.
 
 The warning just indicates that a new configuration is preferred
 for SAP HA deployments going forward.
 
 and what's that new preferred configuration is ? where I can
 see/learned it ?

The new setup preferred by _SAP_ is, that you use the saphostagent.
This new menthod gives you the following benefits (you might not need
today)

1. Support for DB only on a server and Java (only) Instance on an
other server (this is currently not working, because in that case the
bootstrap files are missing at the DB server)

2. Support for Sybase (you can only control the old set of databases
like DB2 (older version than 10), MaxDB and Oracle

3. You use the more generic DB Interface written by SAP and the
DB-Vendors, so the resource agent does not need to take care about
specialities of database but could concentrate on the SAP specific
control of the database

The setup with saphostcontrol is documented by SAP and out of scope of
the documentation of the resource agent, because its SAP land and they
could/should describe that. For the resource agent the only change is,
that the warning disappears.

If you do not find the SAP docu (SDN, installation guides) about this
topic I could try to sent you an URL, but as it is SAP documentation it
could be that you need a SAP marketplace login for that.


 
 
 The former version is still supported (and working, I trust).

Yes with at least one limitation: For Java (only) workloads the there
must be at least one Java instance installed on a node, where DB
should be able to run (the bootstrap files of the java framework are
needed for
monitoring the database).

 
 
 When you get a chance, you should install the SAP host agent as
 per the reference SAP note. (Which I can't look up, it seems.)
 
 I asked the SAP Consultant to install the SAPHostAgent.

Perfect!

 
 The warning does not indicate an error, but tries to get your
 attention about improving your configuration during the next
 appropriate maintenance window. It seems that worked ;-)
 
 correct, cluster never shows any error when it runs/start the
 SAPInstance, though we found that whenever we run the SAPInstance
 from cluster SAP gives us errors while login via SAP
 Client(SAPGui).

Which error? Could it be a license problem? A standard SAP license
applies to one Hardware-Key only - your SAP consultatnt should be able
to solve this.

 
 as a workarround, we stopped the SAPInstance from the cluster, let
 the cluster runs/start the other resources(IP, File Systems, and
 SAPDBInstance) and then manually started the SAPInstance(via sap
 way):
 
 startsap -i DVEBMGS00 -v pgtprd

This should not be needed - there must a something wrong - which error
message did you get when SAPInstance was controlled by cluster and you
login via SAPGui?

If you get a SICK message about 3.0 kernel please update to a current
SAP Kernel (I could provide the SAP Note number if needed).

Regards
Fabian

 
 
 Regards, Lars
 
 
 
 Regards,
 
 Muhammad Sharfuddin
 
 
 On 04/29/2013 08:04 PM, Lars Marowsky-Bree wrote:
 On 2013-04-26T23:56:04, Muhammad Sharfuddin
 m.sharfud...@nds.com.pk wrote:
 
 I just upgraded from SP1(SLES 11/HAE) to SP2, and getting
 following messages when I start the SAPDBInstance resource:
 
 SAPDatabase(SAPDBInstance)[22587]: [22602]: WARNING: Usage of 
 SAPDatabase resource agent without SAPHostAgent is deprecated.
 Please read documentation of SAPDatabase resource agent and
 follow SAP note 1031096 for the installation of SAPHostAgent.
 
 there is a chapter HA OCF Agents in the High Availability
 Guide of SP2, but I found nothing new/peculiar about
 SAPDBInstance there.
 The warning just indicates that a new configuration is preferred
 for SAP HA deployments going forward. The former version is still
 supported (and working, I trust).
 
 When you get a chance, you should install the SAP host agent as
 per the reference SAP note. (Which I can't look up, it seems.)
 
 The warning does not indicate an error, but tries to get your
 attention about improving your configuration during the next
 appropriate maintenance window. It seems that worked ;-)
 
 
 Regards, Lars
 
 
 ___ Linux-HA mailing
 list 

Re: [Linux-HA] Antw: Re: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated

2013-04-30 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30.04.2013 08:27, Ulrich Windl wrote:
 Muhammad Sharfuddin m.sharfud...@nds.com.pk schrieb am
 29.04.2013 um 14:16 in
 Nachricht 517e6482.2040...@nds.com.pk:
 I think that you should just follow that advice, i.e. read that
 SAP
 note and install
 SAPHostAgent.
 
 I asked the SAP Consultant to install the SAPHostAgent issue.
 
 See also the agents documentations: crm ra info SAPDatabase
 
 I read it and found nothing that help me fix this issue.
 
 The good news is that it still works despite of the warning. The RA
 is a good example how to do a simple thing with maximum complexity.
 According to my little understanding that SAPHostAgent is a web
 server running as root, launching the sap start script on demand.
 The RA in turn sends a HTTP request to the Host Agent to start the
 process. I did not care to examine how authentication works,
 because I want to be able to sleep at night ;-)

Oh you could sleep at night, even when I explain it:
The autorization is made by a file permission of a socket on the
system. So the Linux/Unix file permissions are controlling the
permission to sent a set of commands to sapstartsrv / saphostagent.
(Others could also be sent without that file permission - the set
if comamnds needing authorization is controlled by a SAP configuration.)

There are 3 (or more?) methods to authenticate:
a) without (for simple unproblematic commands)
b) via socket/file permission
c) with username/password

c) of course could not be used by the RA without introducing a
security problem (and so does not try it :)

I could not join your statement about the resource agent.

The interface HOW to start/stop databases and instances is given by
SAP, so the author of the RA implemented it in a SAP preferred way.

The reason for the Webservice and to force also the RA is using it that
the Webserver is THE interafce for all methods to control SAPDatabase
and SAPInstances from outside. Its used by
 - SAP MMC
 - SAPMC
 - sapcontrol
and maybe even by more...


Regards
Fabian

 
 Regards, Ulrich
 
 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRf5RqAAoJEJ1uHhrzMvZRtPAH+wSFXab9rjLujhSiqfJvKK6X
IuIPadkxc9PutiqyVLbEL5J976R27aPwiR5xuJP9TkVbygVuq+C+lvhhccEFRb/7
wB0oROFss3htK/qQGkV6oLkTARFTbfo6luWoUzDIWYE+e4BC5VeCy5EG3bUYOvSn
+HIP4Chb1zCvyJqTvRjiTqp32cFpuYmSneTE3HrirrqGoD3gCkjAFlYIROgxbJ0h
xCSdA8/zJt8WzcqzNUuqNHv3mrMqiifYwUXYghd8wZmmwZiz1ZZfx7mOlqxwbwiw
EhqqEQUj9Or/V7q9L0Aw5OJ1Uuqt4vei7YXRqteIRX2xRrCVLR+Km1u6jQJyl+A=
=qRA0
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Usage of SAPDatabase resource agent without SAPHostAgent is deprecated

2013-04-30 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30.04.2013 13:22, Muhammad Sharfuddin wrote:
 Hello Fabian,
 
 On 04/30/2013 02:19 PM, Fabian Herschel wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 Hi Muhammad,
 
 please find my answer below...
 
 
 The new setup preferred by _SAP_ is, that you use the
 saphostagent. This new menthod gives you the following benefits
 (you might not need today)
 
 1. Support for DB only on a server and Java (only) Instance on
 an other server (this is currently not working, because in that
 case the bootstrap files are missing at the DB server)
 
 2. Support for Sybase (you can only control the old set of
 databases like DB2 (older version than 10), MaxDB and Oracle
 
 3. You use the more generic DB Interface written by SAP and the 
 DB-Vendors, so the resource agent does not need to take care
 about specialities of database but could concentrate on the SAP
 specific control of the database
 
 
 thanks a lot for sharing and explaining.
 
 
 correct, cluster never shows any error when it runs/start the 
 SAPInstance, though we found that whenever we run the
 SAPInstance from cluster SAP gives us errors while login via
 SAP Client(SAPGui). as a workarround, we stopped the
 SAPInstance from the cluster, let the cluster runs/start the
 other resources(IP, File Systems, and SAPDBInstance) and then
 manually started the SAPInstance(via sap way):
 
 startsap -i DVEBMGS00 -v pgtprd
 
 This should not be needed - there must a something wrong - which
 error message did you get when SAPInstance was controlled by
 cluster and you login via SAPGui?
 
 If you get a SICK message about 3.0 kernel please update to a
 current SAP Kernel (I could provide the SAP Note number if
 needed).
 
 as said cluster always successfully starts the SAPInstance without
 any error, but when we login into SAP via SAPGui there we got the 
 following error: Run time Errors.START_CALL_SICK.short text
 database inconsistency .start trasaction SICK.

OK, this looks for me like the Linux Kernel is detected as 3.0 instead
of 2.6.
Could you (if needed with your SAP consultant) login and check, if
there is something about the unknown 3.0 kernel? In this case the
SAP Kernel should be updated.

 
 as a workaround I **only** stopped the SAPInstance from cluster(let
 the IP, File Systems, SAPDBInstance remain running via cluster)
 and start SAPInstance via command line startsap -i DVEBMGS00 -v
 pgtprd.
 
 SAP kernel version is 701, and we are running  SAP on SLES 11 SP2 
 via Kernel 2.6 compatibility mode for SAP (SAP note 1310037)

Yes the 2.6 compatibility environment was only inteded to be used as
a bridge between the avaialblibity of Linux 3.0 kernel and the customer
to be able to update to newest SAP Kernels like 720 PL 402 or so.

Sorry, that I couldn't solve that problem via this list, it now gets
very SAP and support specific - if you have already opened a ticket at
SUSE, please reference to that thread (my colleague already has read
it :) and than we could help you with professional support.

In sum its that SAP also wants you to update to the newer SAP kernels
and my guess is that this is exactly your problem. While
starting/stopping  the instance with login-user/shell will lead into the
2.6 compat environment, the RA still runs in 3.0.

== solution is either to change one line in RA (NOT preferred) or to
update SAP kernel to current version (very appreciated and preferred :)

 
 
 Regards,
 
 Muhammad Sharfuddin 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRf8Y0AAoJEJ1uHhrzMvZR9/QIAIKZs1RyGLfZZ9aUlJZR7EGM
V7thYcSldUGn0HivtW9N+kufxJHfapJ70L1o9wAw0kTbq5CaVgt42B177zB4Kq3q
5q6db1ouDh7ZufV+6Dprhff8mplEMrTCJKDPjYnna7COYzkWYPun2FBNPmAV1pGs
rBmxDBH9enZ5Piacj357Rqqs2mFhmnBeSDOIDDqMX8BBG+MIuslYOoBfyzwUTilv
ECJnkAHQZcT9CsRJ6wLkQCfFSD+HzpGp3tLZhYxi9ub7SPlthCI8vJOgp5HZhbLp
SvR9SCp3RG71+HLWKuCBd+u/JWDuymFnZ8jIoyUDWFVJRdkT59jsha2T2qIqbRI=
=nIwo
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Q: NFS cross mounting

2012-12-22 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/21/2012 08:14 AM, Ulrich Windl wrote:
 Dimitri Maziuk dmaz...@bmrb.wisc.edu schrieb am 20.12.2012 um 17:50 in
 Nachricht 50d341c3.1040...@bmrb.wisc.edu:
 Hi!
 
 So (pseudo-code following)
 
 if (host(NFS_server) == host(NFS_client))
 rmdir(mountpoint);
 ln -s export_dir mountpoint
 else
 makedir(mountpoint);
 mount(NFS_Server:export_dir, mountpoint)

Hmm - I was trying to something similar using bind mounts instead of
symlinks which is more compatible to applications which may probe for
DIRECTORIES and not also SYMLINKS.

Unfortunately at least for SAP Workloads this is not an option, because
all Instances must be killed to switch from a bind mount to an NFS mount
and vice versa. This would decrease the availability of the Application
level. The problem are already opened file handels which can't be
shifted from the bind-Mount to the NFS-mount. For all files and
directories which are only opened after the FS switch that might be OK,
but again that would also mean to kill all application processes using
the FS.

I do not see that this would be a fissible solution.

Kind regards
Fabian

 
 ?
 
 Now when som program does a cd mountpoint and the NFS Server would move to 
 another host, you'd have to kill all processes that use mountpoint to be able 
 to unmount it.
 In contrast with the NFS solution, the client applications would be blocked 
 until NFS server is running on the other node. Obviously this solution seems 
 preferrable.
 
 Regards,
 Ulrich
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ1f7hAAoJEJ1uHhrzMvZRmtcIAKeeOvfr67Ml3+oEd8mLI8hh
WtqOfHzF5S1zhlnV+T3yli64E7xxFP42cR41dWJlZTU/McC6fFQbMklXLgagrqWc
0PC81BL9i4dBHFqFZDyg/GPmfSusXU3FCFftR5qYyiF6SAUfbdKWgzCUqCzpCcXX
XHkOM8z9j9mgCDmYpbdjZfFyDu7XtVwQNyCl+OV5MBw3K0xBNBabpZ1yoYG7m5Nz
xG4dk9YcO1PtReo0PkT2gg9vTJT8umPQdKGI6O6RstnpJR5lOCKHWUIjZ4tzlNqU
bNoNSVLS4SYG6bH1hF8ZiC1p6Kc5ZxDyxJ51MbMmz7fkKum0BKUBCtm+a6TAcDA=
=DM80
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] why nodes cant see each other ?

2012-12-14 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

As you are using Multcast (MCAST) - could it be the case that the
switch/LAN dropped all Multicast packages for some time? As lot of
switches which are managed are dropping MCAST by default (at least
I got that feedback from customers) it could be that your switch was
either reconfigured for a time period or there was a fireware update?

Just my thoughts abou things happened at customer side.
Fabian Herschel


On 12/14/2012 06:31 AM, Muhammad Sharfuddin wrote:
 node1(ailprd1) IP:192.168.7.11 node2(ailprd2) IP:192.168.7.12
 
 Its a two node active/passive cluster, running perfectly since last
 two months, but yesterday both nodes were fenced(rebooted).
 Network connectivity b/w both nodes is perfect, and cluster is
 running fine again.
 
 Help me know the reason behind the following situation, and how can
 I avoid it happening next time:
 
 on node1(active node): Dec 13 12:31:06 ailprd1 corosync[7274]:
 [TOTEM ] A processor failed, forming new configuration. Dec 13
 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM CONFIGURATION CHANGE 
 Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New Configuration: 
 Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0)
 ip(192.168.7.11) Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ]
 Members Left: Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0)
 ip(192.168.7.12)
 
 on node2(passive node): Dec 13 12:31:05 ailprd2 corosync[7021]:
 [TOTEM ] A processor failed, forming new configuration. Dec 13
 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM CONFIGURATION CHANGE 
 Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New Configuration: 
 Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0)
 ip(192.168.7.12) Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ]
 Members Left: Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0)
 ip(192.168.7.11)
 
 for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks that
 node1 left. then node2 tries to start the resources which were
 already running on node1, and both nodes were fenced.
 
 corosync.conf : totem { rrp_mode: none join:  60 max_messages:
 20 
 vsftype:  none consensus: 6000 secauth:   off 
 token_retransmits_before_loss_const:  10 token:   5000 version:   2
 
 interface { bindnetaddr:  192.168.7.0 mcastaddr:  224.0.0.116 
 mcastport:51234 ringnumber:   0 } clear_node_high_bit:yes } 
 logging
 { to_logfile: no to_syslog:   yes debug:  off timestamp:  off 
 to_stderr:no fileline:off syslog_facility: daemon
 
 }
 
 Regards, Muhammad Sharfuddin
 
 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQywndAAoJEJ1uHhrzMvZRutAIAL4MW1q2hUPH6cU6Md4ZjSl2
T6C8c+LIjBCGjSIBwwFgMVbMqeB78n/IFUw5QcRkiZVAZ8rDaDEIcb28pJ88yQdu
Fr+zkxO3jO30bVyo5KW0672KDYjTlJnUWjBWC+FdG5TSWyPHfnKQew06BwoQxqR+
ad4EUESJhKsRnobFkIZZHVUTXc4EUDn3U/zROh/c29k0JVblt3xip08bZLuaS7yg
vBxOavCpWidvukhKdtnN1gOKsnhvqcHmz+yQlMM8Du03U7rcRQsA2ORruFoODh0l
yY0hOWtVkgh7iVHdA3RZfMj2yAGQGSggIMHS7YA3k9J4/8cU1AfIOTUWxY61RI4=
=egRI
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] why nodes cant see each other ?

2012-12-14 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Muhammad,

also find my ansawer inline...

On 12/14/2012 12:55 PM, Muhammad Sharfuddin wrote:
 On Fri, 2012-12-14 at 16:47 +0500, Muhammad Sharfuddin wrote:
 please find me replies in-line
 
 On Fri, 2012-12-14 at 12:13 +0100, Fabian Herschel wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 As you are using Multcast (MCAST)
 
 yes
 
 so using Unicast instead of MCAST, would be a solution ?
 

It COULD be a solution, if the network was the problem.
Some years ago I wrote a tiny programm to send MCAST with
high load and to count drops - maybe I can reload the code
and sent it to you, if you are interested. Its GPL so free
to use under the terms of GPL and no warranty :)

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQyyUjAAoJEJ1uHhrzMvZR5kkIALI6zLT17EmCgZww/rH95kZq
jYMSpmlPYAhAQahjO3SvGf3Fj3yiaPACtbAkmmUAgewspp7Xe/WrqZrYv6OvqR79
MStU+bS7Qs3P2GES44czkpes9SRcI2lLig9Q6GauPh8OBA2m4VXGMM15NqtqxRWd
zkZtIifVUH9skuXUg4kHFMISjVE77dxh2JECnuLOEVOghD00An1sI46FgoMsygu6
DvWoyzwgWhgxz0U7Fb8WI1yTraXiZP4ozuBl8k0MchclB53vlkek9IxJGFvsTGKX
EnnMxVJYL5X/8i7SM68lldQ2f0WttUIIXShdLfBUsgJ/QQvqq8YG4D9/kFKluLI=
=JeLz
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/10/2012 10:59 PM, codey koble wrote:
 To anyone who could help possibly:
 
 My current setup: 2 Ubuntu 10.04 LTS servers running heartbeat,
 pacemaker, apache, and mysql Heartbeat and pacemaker are running
 great for my needs with one exception, currently both nodes are
 showing mysql as slaves. I have mysql configured in a master/slave
 setup and that is working great on its own.
 
 I noticed when I tried to promote one of the servers that an error
 occurred stating that the ocf:heartbeat:mysql did not support the
 feature.  I evaluated the script and realized it was an older
 version and did not contain any of the promote/demote code.  I
 found the newest code for the script in the github repo and
 replaced the entire mysql file with the new code.  Upon doing this
 it then gave an error stating that the ocf:heartbeat:mysql resource
 agent was not installed.

Could you send the error message more precise? Does the cluster tell
you the RA si not installed  (check path and file permissions) or does
the LRM tell that the RA itself has returned a exit code not
installed (this would mean the RA does not find your mysql
binaries/config/or whatever)?

 
 My question would be is there a simple way to update the script
 instead of manually replacing it like I did, or is there a way to
 get the code I changed to working?
 
 Thanks in advance for any help! 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQyH9xAAoJEJ1uHhrzMvZRjW4H/RUxkgL/nXyKZqz6xl8dDn3P
bPcCqqOvSX2x32umwkEaS2JZ7Gabo8O7sHIZNC/HcrmDttoRo6L4BNR+W2QkQtMV
FEuTVqktOq6WdeaZ2Hn66S42+IkzHOOJRRJzp0GSLfdlxzRiM2E+an/QmPwWbpZZ
EFvZbyDScqrKyQo7vN5CE0K1yb9JCrOxLMO2NX1D2reiOv7f3pvslKO03eohLcy/
k4ZagdO9GvIPs7PPj+pI5aUYbH7ypejPR+z8e6OXpAgbfSQg7AJuTgllMcCsODAe
BEb78ZWpa4pANAugRvJZ87A1ATjgJy2MBubyewqGRqghnNeqAjq5hPgzH9cuWoQ=
=OfyW
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-12-05 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/04/2012 08:34 PM, Lars Marowsky-Bree wrote:
 On 2012-12-04T20:38:54, Fabian Herschel fabian.hersc...@arcor.de
 wrote:

 Specifying target-role=Master is completely different from
 specifying a role=Master/Slave on an operation.
 
 The former defines that you want the cluster to promote the
 resource to Master (setting it to slave would prevent the
 resource from reaching master state, just like stopped would
 prevent it from being started at all).
 
 The latter defines that you want to run different monitor
 operations per role.

Yes, you are right :) I mixed the error case does not promote with
does not detect resource failures after promoted.

 
 
 Regards, Lars
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQv4L7AAoJEJ1uHhrzMvZRogUH/iFwh9H6LDZrOLliyUbS0vhS
4PexnVGl2ruo2Va6rnK+ZLUyoQvdCLEM6wDR6wtaA4ZpnxHYIfJi1ZgS/iaFFf/3
a2oqEUo5WFo0p/K94oBfjDIcYjzE+3xuCXfYKujRISiUPf6njX8sQEqEcS1GOfxR
PCjH8XNLEvjs/J0g1Y8ATle5TZvLXAy0eTud18xeOlL1AahraU9g1QTDhgO3R4B2
PXfTMrAObZRmyC8HdKItq5OPX0/SfTXtP4vD2d7sfBw7XGgdwGS28zqgwu4V6OdM
eQ6BA9RjdAe/NKDPlOwc33oYzAlyNWftYK2VNObxrf77U0ms59jGA2iX8jaF/sQ=
=wNpP
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-12-04 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/29/2012 10:14 PM, Robinson, Eric wrote:
 Bump... does anyone have some insight on this? Google is not
 turning up anything useful.
 
 Our newest cluster will not failover master/slave drbd resources.
 It works fine manually using drbdadm from a shell prompt, but when
 we try it using 'crm node standby' and letting the cluster manage
 the resource, crm_mon just keeps saying the resource FAILED.
 
 We see a lot of these messages in the corosync.log file:
 
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 DEBUG: ha02_mysql:
 Calling drbdadm -c /etc/drbd.conf primary ha02_mysql 
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 ERROR: ha02_mysql:
 Called drbdadm -c /etc/drbd.conf primary ha02_mysql 
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 ERROR: ha02_mysql: Exit
 code 11
 
 There is no indication of what may be causing the 'Exit code 11'
 
 Here is a link to the corosync log, taken from the standby server
 (ha09a) where we are trying to fail the resource to...
 
 www.psmnv.com/downloads/corosync1.loghttp://www.psmnv.com/downloads/corosync1.log

  Here is what I have installed...
 
 corosync-1.4.1-7.el6_3.1.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 
 pacemaker-1.1.8-4.el6.x86_64 pacemaker-cli-1.1.8-4.el6.x86_64 
 pacemaker-cluster-libs-1.1.8-4.el6.x86_64 
 pacemaker-libs-1.1.8-4.el6.x86_64
 
 Following is my crm config. It's pretty basic.
 
 
 node ha09a \ attributes standby=off node ha09b \ attributes
 standby=off primitive p_drbd0 ocf:linbit:drbd \ params
 drbd_resource=ha01_mysql \ op monitor interval=60s primitive
 p_drbd1 ocf:linbit:drbd \ params drbd_resource=ha02_mysql \ op
 monitor interval=45s primitive p_vip_clust08
 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.210
 cidr_netmask=32 \ op monitor interval=30s primitive
 p_vip_clust09 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.211
 cidr_netmask=32 \ op monitor interval=30s ms ms_drbd0 p_drbd0
 \ meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true target-role=Master ms ms_drbd1
 p_drbd1 \ meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true target-role=Master property
 $id=cib-bootstrap-options \ dc-version=1.1.8-4.el6-394e906 \ 
 cluster-infrastructure=openais \ expected-quorum-votes=2 \ 
 stonith-enabled=false \ no-quorum-policy=ignore \ 
 last-lrm-refresh=1352846885 rsc_defaults $id=rsc-options \ 
 resource-stickiness=100

I am not sure if that will really help you - but in my cluster (ok
older pacemaker version) I ahve the following to define a master slave
resource:

primitive rsc_sap_HA0_ASCS00 ocf:heartbeat:SAPInstance \
   operations $id=rsc_sap_HA0_ASCS00-operations \
   op monitor interval=11 role=Slave timeout=60 \
   op monitor interval=13 role=Master timeout=60 \
   params \
 InstanceName=HA0_ASCS00_sapha0as \
 START_PROFILE=/usr/sap/HA0/SYS/profile/HA0_ASCS00_sapha0as \
 ERS_InstanceName=HA0_ERS10_sapha0er
 ERS_START_PROFILE=/usr/sap/HA0/SYS/profile/HA0_ERS10_sapha0er

ms msl_sap_enqrepl_HA0 rsc_sap_HA0_ASCS00 \
   meta clone-max=2 target-role=Started master-max=1 \
   is-managed=true



So I have a defined operation role=Master on the primitive but NOT a
targe-role=Master on the Master/Slave.

Additionally I have a colocation constraint between primitives/group
which must run together with the promoted clone:

colocation col_grp_sap_as_HAO_msl_sap_enqrepl_HA0_MASTER inf: \
   grp_sap_as_HA0 msl_sap_enqrepl_HA0:Master

Sorry - I did not have checked, if the syntax has changed here, or if
your syntax where valid also in the past - so it might be that my hint
is completely useless ;-) I just wanted to point on a thing where your
config is completely different to my config.

Hopefully ma hint helps...
Fabian


 
 -- Eric Robinson
 
 
 
 Disclaimer - November 29, 2012 This email and any files transmitted
 with it are confidential and intended solely for General Linux-HA
 mailing list. If you are not the named addressee you should not
 disseminate, distribute, copy or alter this email. Any views or
 opinions presented in this email are solely those of the author and
 might not represent those of Physicians' Managed Care or Physician
 Select Management. Warning: Although Physicians' Managed Care or
 Physician Select Management has taken reasonable precautions to
 ensure no viruses are present in this email, the company cannot
 accept responsibility for any loss or damage arising from the use
 of this email or attachments. This disclaimer was added by Policy
 Patrol: http://www.policypatrol.com/ 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQvlFOAAoJEJ1uHhrzMvZRcj8IAIrNf4T4dFvzblLnkHSSUHvN

Re: [Linux-HA] Pacemaker master/slave - how not to autostart slave after migration of a master or failure of a slave?

2012-11-26 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Rafal,

placing a new master on the right (not restarted) side is typically
done by the crm_master calls. You might check the scoring if the
resources after you have killed one side and check it with
ptest -Ls (or an matching other call without ptest - sorry I do not
remember the other comamnd).

On SLES pstest -Ls will show you the scores in the Live situation
and if crm_master is used it also will show you promote-scores.

In my resourceagents the tomcat RA does not contain a crm_master call,
so this might be the cause.

Best regards
Fabian

On 11/26/2012 01:39 AM, Andrew Beekhof wrote:
 On Fri, Nov 23, 2012 at 3:08 AM, Rafał Radecki
 radecki.ra...@gmail.com wrote:
 Hi all.
 
 I am currently making a Pacemaker/Corosync cluster which serves
 Tomcat resource in master/slave mode. This Tomcat serves Solr
 java application. My configuration is:
 
 node storage1 node storage2
 
 primitive TSVIP ocf:heartbeat:IPaddr2 \ params
 ip=192.168.100.204 cidr_netmask=32 nic=eth0 \ op monitor
 interval=30s
 
 primitive TomcatSolr ocf:polskapresse:tomcat6 \ op start
 interval=0 timeout=60 on-fail=stop \ op stop interval=0
 timeout=60 on-fail=stop \ op monitor interval=31
 role=Slave timeout=60 on-fail=stop \ op monitor
 interval=30 role=Master timeout=60 on-fail=stop
 
 ms TomcatSolrClone TomcatSolr \ meta master-max=1
 master-node-max=1 clone-max=2 clone-node-max=1
 notify=false globally-unique=true ordered=false 
 target-role=Master
 
 colocation TomcatSolrClone_with_TSVIP inf:
 TomcatSolrClone:Master TSVIP:Started order
 TomcatSolrClone_after_TSVIP inf: TSVIP:start
 TomcatSolrClone:promote
 
 property $id=cib-bootstrap-options \ 
 dc-version=1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
 \ cluster-infrastructure=openais \ expected-quorum-votes=4 \ 
 stonith-enabled=false \ no-quorum-policy=ignore \ 
 symmetric-cluster=true \ default-resource-stickiness=1 \ 
 last-lrm-refresh=1353594420 rsc_defaults $id=rsc-options \ 
 resource-stickiness=10 \ migration-threshold=100
 
 So logically I have: - one node with TSVIP and TomcatSolrClone
 Master; - one node with TomcatSolrClone Slave. I have set up
 replication beetwen Solr on TomcatSolrClone Master and Slave and
 written an ocf agent (attached). Few moments ago when I killed
 the Slave resource with 'pkill java' the resource was restarted
 on the same node despite the fact that the monitor action
 returned $OCF_ERROR_GENERIC and I have on-fail=stop for
 TomcatSolr set (I have also tried block with same effect).
 
 Then I have added a migration threshold:
 
 ms TomcatSolrClone TomcatSolr \ meta master-max=1
 master-node-max=1 clone-max=2 clone-node-max=1
 notify=false globally-unique=true ordered=false 
 target-role=Started \ params migration-threshold=1
 
 and now when I kill java on Slave it does not start anymore (the
 Master is ok). But when I then kill java on Master (no resource
 running on both nodes) everything gets restarted by the cluster
 and Master and Slave are running afterwards. How to stop this
 restart when Slave and Master both fail?
 
 Could you file a bug (https://bugs.clusterlabs.org) for this and 
 include a crm_report for your testcase? Its likely that you've hit
 a bug.
 
 
 Best regards, Rafal.
 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQs3+iAAoJEJ1uHhrzMvZRkJcH/ij5X5NQn5OxBr0ZEGapj7eM
oX9BYT16xPs1HJXLMsjbKVmctAsGLJL79j9gnSVWGS7LhTv1XjHQlHHJyA7y+BbG
irscHbgMHg/WwreYeoyfcHRQP/o0rODPWEEmGfI8R89hkqCPjayMRw9NJOkZHMMq
ED/VtSlZxeB9wKZnWz9bw8XW4hov0wInhdl4hvSrnh2fCCXxatGz+VtwRXvLrOm3
+h5g+nkpn+Q5hAz8xTnn2TMvOAE10SOnWw9XX6vpkgUU61TPTJ9am53x+e4iNURu
7hsUdXWfm3h7+c10BzcrIjVS5GEwu29ZvYmsMiM4LIVXImloFEvmsd5Bpw8yVaw=
=Wbeu
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: pcs or crmsh?

2012-11-15 Thread Fabian Herschel
On 11/14/2012 03:33 PM, Digimer wrote:
 Linux in general is all about choice, possibly to a fault. I see
 no reason why clustering shouldn't be the same.

I really like linux and cluster frameworks to spent choice (I was
even so near to miss-spell that as joice :) but on the other hand
it does not make sense to change things like crm to pcs without
having any problems with the integrated, stable, multiple-used,
road capable solution we already had.

Customers does not really like such changes as it shows that this
cluster solution is still teenage and not grown. This is in my
point of view a very bad message!

Regards
Fabian




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: pcs or crmsh?

2012-11-15 Thread Fabian Herschel
On 11/15/2012 12:03 AM, Andrew Beekhof wrote:
 I can think of 3 tooling changes:
 
 - ptest/crm_simulate - hb_report/crm_report - standalone crmsh
 
 Thats not /too/ bad in 4 years.

But completely un-needed. Where is the benefit on changing from crm to
pcs?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: pcs or crmsh?

2012-11-15 Thread Fabian Herschel
On 11/14/2012 11:20 PM, Andrew Beekhof wrote:
 I sincerely hope SUSE does continue with crmsh but I _like_ that
 there are people trying something new.

Yes I also like things which are going better. But what is the benefit
on dropping CRM and introducing PCS to that procject? What is the
benefit for all distributions which in the past did not say its only
tech prev?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pcs or crmsh?

2012-11-15 Thread Fabian Herschel
On 11/14/2012 05:10 PM, alain.mou...@bull.net wrote:
 Hi Just for information, I'm using cleanup and crm_mon very very
 very often with lots of ressources configured in Pacemaker and
 never had any problem like the problems you describe ... (on RHEL) 
 Alain

crm shell and tools like crm_mon are stable on SLES since years! I
really like this story to go on and no silly changes which have
0-benefit. Changes should have a real benefit otherwise they just
hurt the story of a cluster project.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] string2msg_ll: node [?] failed authentication

2011-08-02 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Are there other nodes with the same multicast address?

On 08/02/2011 12:38 AM, Hai Tao wrote:
 
 I reinstalled the OS for node1 (in a two nodes HA, and the node1 had a disk 
 error), and reconfigured HA. however, after restarting the heartbeat, I see 
 many errors of  string2msg_ll: node [?] failed authentication on the node 2.
  
 I checked authkeys, and confirmed both nodes have the same setting.
  
 Is ther any idea why this happen?
 
 
 Thanks.
  
 Hai Tao 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJON+RfAAoJEJ1uHhrzMvZRr1cH/0qj3P1oT9nq+itLqz8u9nPV
bHeCjpOCGprM13tVNv0hZhwSxVONdaSfJWZTi3vwaiZORHlxIaXk99S+oRRen99y
gncuWFZM753prTAqCqfgp4s3xGqIIktc/pMJTTxLVoQC9pF8M/2G65wYFyBvAjht
UaMVkcQY+WgKQdyCD0YVYphkg3GGTlhBBPZzUIPqzFcXW6Ax3Ht5XaT5xc1BlW0z
ee2VMy6nTKg4Wog+qpTFcP8Gnose5vSRCTiHsUR1O7Br3+nhoLcpwb+4BtQ6wj+5
4q/2NwXBlaOGEPmmHhXyqdKtgKyeVdLnerAss+YBaVzimukY3H0g6ntHyTmRGa8=
=TkPw
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] two nodes insolate

2010-07-07 Thread Fabian Herschel
Of course you checked, that firewall is off?

Am 06.07.2010 13:01, schrieb Trujillo Carmona, Antonio:
 I'm try to setup a 2 nodes cluster for HA, after configure it I began to test 
 it but fail.
 I configured a ping node and it got offline always.
 I try to configure a ping resource and neither it work.
 always I got:
 --
  crm(live)# status
 
 Last updated: Tue Jul  6 12:48:30 2010
 Stack: openais
 Current DC: balanceador-2 - partition WITHOUT quorum
 Version: 1.0.8-f2ca9dd92b1d+ sid tip
 2 Nodes configured, 2 expected votes
 3 Resources configured.
 

 Online: [ balanceador-2 ]
 OFFLINE: [ balanceador-1 ]

  control-aislamiento  (ocf::pacemaker:ping):  Started balanceador-2
 crm(live)# 
 --
 My configuration is:

 crm(live)# configure
 crm(live)configure# show
 node $id=10.104.24.204 hvn21:ping \
   attributes standby=false
 node balanceador-1
 node balanceador-2
 primitive control-aislamiento ocf:pacemaker:ping \
   meta target-role=Started \
   operations $id=control-aislamiento-operations \
   op monitor interval=10 timeout=60
  \
   params host_list=hvn21 balanceador-1 balanceador-2
 
 primitive control-haproxy lsb:haproxy \
   meta target-role=Started is-managed=
 true \
   operations $id=control-haproxy-operations \
   op monitor interval=15 timeout=15
  start-delay=15
 primitive control-ip ocf:heartbeat:IPaddr2 \
   meta target-role=started \
   operations $id=control-ip-operations \
   op monitor interval=10s timeout=2
 0s \
   params ip=10.104.16.234 lvs_support=
 31mtrue unique_clone_address=true
 location ip-en-balanceador-1 control-ip inf: bal
 anceador-1
 colocation weblogic inf: control-ip control-
 haproxy
 order haproxy-primero : control-haproxy cont
 rol-ip
 property $id=cib-bootstrap-options \
   dc-version=1.0.8-f2ca9dd92b1d+ sid tip \
   cluster-infrastructure=openais \
   stonith-enabled=false \
   last-lrm-refresh=1278331717 \
   expected-quorum-votes=2 \
   no-quorum-policy=suicide
 crm(live)configure# 


 Thank for your time 

   

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Setting up HA-Cluster with heartbeat and SAP

2009-03-18 Thread Fabian Herschel
Do you still really need heartbeat v1???

There are some advanced SAP Resource-Agents for heartbeat v2, which
also include monitoring and service restarts.

The problem with your own(?) RA is it could damage your data, if the
unmounts are not work properly (failing due to open files). This could
cause dual mounted file systems - ugly!

Best regards
Fabian Herschel

Andreas Reschke schrieb:
 Hi,
 i need to set up a HA-Cluster for a SAP-application.
 Requirements:
 - 2 IBM x3650
 - SLES 10 SP2 x86_64
 - SAN (EMC)
 
 Steps:
 1. installing SLES on the server
 2. configure heartbeat (v1)
 - /etc/ha.d/ha.cf: 
 logfile /var/log/ha-log
 keepalive 2
 deadtime 30
 warntime 10
 initdead 120
 auto_failback off
 bcast   eth2
 bcast   eth3
 ucast eth2 11.0.0.1
 ucast eth3 11.0.0.2
 ucast eth2 11.0.0.3
 ucast eth3 11.0.0.4
 nodebgstsapgtsls1
 nodebgstsapgtsls2
 ping 10.20.94.1
 keepalive 10
 
 - /etc/ha.d/haresources:
  bgstsapgtsls1 10.20.94.200/32/255.255.255.255/bond0:1 sap
 
 - /etc/ha.d/resource.d/sap:
 # Author:   Andreas Reschke andreas.resc...@behrgroup.com
 # License:  GNU General Public License (GPL)
 # Date: 2009-03-16
 #
 #set -x
 
 # See how we were called.
 case $1 in
   start)
 # SAP-Startscript
 # mount SAN
 # LVM-Volumes search and activate
 /etc/init.d/boot.md start
 /etc/init.d/mdadmd start
 /etc/init.d/boot.lvm start
 # setting hostname
 hostname bgstsapgpls01
 # filesystem mount
 # all filesystems (sap_vg) are on the SAN
 mount /dev/sap_vg/lv20 /sap/btpadm
 mount /dev/sap_vg/lv19 /oracle
 mount /dev/sap_vg/lv1 /oracle/BTP
 mount /dev/sap_vg/lv2 /oracle/BTP/mirrlogA
 mount /dev/sap_vg/lv3 /oracle/BTP/mirrlogB
 mount /dev/sap_vg/lv4 /oracle/BTP/oraarch
 mount /dev/sap_vg/lv5 /oracle/BTP/origlogA
 mount /dev/sap_vg/lv6 /oracle/BTP/origlogB
 mount /dev/sap_vg/lv7 /oracle/BTP/saparch
 mount /dev/sap_vg/lv8 /oracle/BTP/sapbackup
 mount /dev/sap_vg/lv9 /oracle/BTP/sapcntrl1
 mount /dev/sap_vg/lv10 /oracle/BTP/sapcntrl2
 mount /dev/sap_vg/lv11 /oracle/BTP/sapcntrl3
 mount /dev/sap_vg/lv12 /oracle/BTP/sapdata1
 mount /dev/sap_vg/lv13 /oracle/BTP/sapdata2
 mount /dev/sap_vg/lv14 /oracle/BTP/sapdata3
 mount /dev/sap_vg/lv15 /oracle/BTP/sapdata4
 mount /dev/sap_vg/lv16 /oracle/BTP/sapreorg
 mount /dev/sap_vg/lv17 /sapmnt/BTP
 mount /dev/sap_vg/lv18 /usr/sap/BTP
 # SAP start
 su - orabtp -c /oracle/BTP/102_64/bin/lsnrctl start
 # wait for listener
 sleep 10
 su - btpadm -c /usr/sap/BTP/SYS/exe/run/startsap
 # Backupdaemon start
 /etc/init.d/adsm start
 ;;
   stop)
 # SAP-Stopscript
 su - btpadm -c /usr/sap/BTP/SYS/exe/run/stopsap
 su - btpadm -c /usr/sap/BTP/SYS/exe/run/saposcol -kc
 su - btpadm -c /usr/sap/BTP/SYS/exe/run/cleanipc 41 remove
 su - orabtp -c /oracle/BTP/102_64/bin/lsnrctl stop
 # wait for all stopping process
 sleep 10
 # if necessary
 killall sapstartsrv
 # umount SAN
  umount /oracle/BTP/mirrlogA
 umount /oracle/BTP/mirrlogB
 umount /oracle/BTP/oraarch
 umount /oracle/BTP/origlogA
 umount /oracle/BTP/origlogB
 umount /oracle/BTP/saparch
 umount /oracle/BTP/sapbackup
 umount /oracle/BTP/sapcntrl1
 umount /oracle/BTP/sapcntrl2
 umount /oracle/BTP/sapcntrl3
 umount /oracle/BTP/sapdata1
 umount /oracle/BTP/sapdata2
 umount /oracle/BTP/sapdata3
 umount /oracle/BTP/sapdata4
 umount /oracle/BTP/sapreorg
 umount /oracle/BTP
 umount /oracle
 umount /sapmnt/BTP
 umount /usr/sap/BTP
 umount /sap/btpadm
 # setting old hostname
 hostname bgstsapgtsls1
 # Backupdaemon stop
 /etc/init.d/adsm stop
;;
restart|reload)
 $0 stop
 $0 start
 ;;
   *)
 echo Usage: sap {start|stop|restart}
 exit 1
 esac
 
 exit $RETVAL
  
 
 Questions: Does this work? Can I have problems with configuration? Does 
 anybody a similar configuration?
  
 Gruß 
 Andreas Reschke
 
 BG-IM173
 Unix/Linux-Administration
  
 Behr GmbH  Co. KG
 ST B29, 3.OG
  
 Tel.: +49 711 896-4598
 Fax: ++49 711-8902-4598
 Mobil: 0173-3197397
 andreas.resc...@behrgroup.com

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Re: Configurating STONITH device (how to avoid reset each other)

2009-03-15 Thread Fabian Herschel
Hi,

for heatbeat 2.1.4 there is IMHO no out-of-the-box solution for that
problem.

I dont know, if the following method would be a valid method:

Edit(!) the stonith script and add a sleep XX to the one of the nodes
stonith script. This would cause one of the script to hang for some
seconds. In consequence the resulting stonith actiond should not appear
at the same time. Hopefully this does not work against haertbeat
internal sleeps (did not test that so far).

This method also causes, that teh cluster takeover action will run
some seconds later (on the changed node), because the stonith action has
to be fullfilled before other actions could be processed.

@List: Would that be a valid work-arround?

Regards
Fabian Herschel


linux-ha-requ...@lists.linux-ha.org schrieb:

 Betreff:
 [Linux-HA] Configurating STONITH device (how to avoid reset each other)
 Von:
 Alessandra Giovanardi a.giovana...@cineca.it
 Datum:
 Wed, 4 Mar 2009 18:07:32 +0100 (MET)
 An:
 linux-ha@lists.linux-ha.org
 
 An:
 linux-ha@lists.linux-ha.org
 
 
 Hi,
 I'm using heartbeat on a cluster of 2 nodes and stonith to avoid split
 brain with external/ipmi:
 
 heartbeat-stonith-2.1.4-0.11
 heartbeat-2.1.4-0.11
 
 I'm using heartbeat with crm off (version 1-like).
 
 I've a question: If the nodes turn unavailable *each* *other*, how can
 avoid that node-1 RESETS node-2 and node-2 RESETS node-1 at same time?
 
 Which is the same question of this post:
 http://www.nabble.com/Configurating-STONITH-device-(reset-each-other)-td21672102.html
 
 
 where the answer:
 No, but it is extremely unlikely for this to happen.
 
 is for me not so exhaustive...
 
 Someone has solved this problem or evalutated the occurrence of this event?
 
 Thanks
 A.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] RE: Help with STONITH Plugin

2009-02-08 Thread Fabian Herschel
You can either query the whole cluster resource definition:

cibadmin -Q

od you can query the definition of a single resource
(primitive/group/clone):

crm_resource -l

This gives a list of defined resources

crm_resource -r ONE-OF-YOUR-RESOURCES -x

queries the xml-definition of your resource.

 Von:
 Gruher, Joseph R joseph.r.gru...@intel.com
 Datum:
 Fri, 6 Feb 2009 15:32:43 -0800
 An:
 General Linux-HA mailing list linux-ha@lists.linux-ha.org
 
 An:
 General Linux-HA mailing list linux-ha@lists.linux-ha.org
 CC:
 Liu, Zheng-yang zheng-yang@intel.com
 
 
 Can the resource definition be captured or exported?  Would that be part of 
 the plugin script itself?  I can send any useful debug information that can 
 be captured from the system if you can provide some guidance on what would be 
 helpful.
 
 Thanks,
 Joe
 
 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Fabian Herschel
 Sent: Friday, February 06, 2009 11:03 AM
 To: Linux-HA
 Subject: Re: [Linux-HA] RE: Help with STONITH Plugin
 
 Thanks for the input.  What could cause the STONITH request to not be
 sent from tengine?
 
 Do you have defined FENCE as a reaction in one of your resource
 operations? Without the resource definition its not easy to tell, why
 fencing is not started.
 
 Thanks,
 Joe

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Q: Known problems/limitations with quorum server?

2008-02-11 Thread Fabian Herschel
Hi all,

my question is: are there any know problems/limitations
with quorum server and heartbeat 2.1.13 (or 2.0.8)?

I would need the quorum server for a split-site (streched)
4 node cluster (2 nodes on each side).

Best regards
Fabian
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources

2007-12-01 Thread Fabian Herschel
Hi,

please try either lower-cased host/node names or use the patch I sent
yesterday. The problem is that heartbeat uses the lower-cased hostnames
as nodenames and membership list in the CCM. EVMS compares
case-sensitive. This means evms says your cluster node acquiring the
private container is not allowed to do it, as the CCM has not the exact
node name in its list.

In you case the node names (CZVLabNode2) is not lower cased this is the
cause of the problem. Either change both nodes to lower cased (uname -n
must report correctly, hostname also), or apply the patch.

After that you should use the following procedure to come out of the
stored failuers of the evms_failover resource:

1. Cleanup the resource
2. Stop(!) the resource
3. Start the resource

If the resource belongs to a group finaly delete the target_role of the
evms_failover resource.

Now everything should work fine.

Best regards
Fabian

Am Freitag, den 23.11.2007, 11:16 +0100 schrieb Chris:
 Hi Yan,
 Thanks a lot for your help. I took out the evmsSCC
 resource from the scenario, but I did not see any difference in the
 system behavior, then I followed your suggestion and I manually tested
 the EVMS commands from the CLI while both the nodes where in stand-by,
 and I actually realized that the command:
 
 modify: gwcont,type=private,node=CZVLabNode2
 
 was failing; was somehow not recognized as a valid command.
 
 The really weird thing is that the same command, avoiding the capital
 letters in the host name, was successful:
 
 modify: gwcont,type=private,node=czvlabnode2
 
 This was true in both nodes, so I modified both the hostnames from:
 
 CZVLabNode1 -- czvlabnode1
 CZVLabNode2 -- czvlabnode2
 
 and now the fail over is working properly. like everything else.
 
 The reason why I tried to change the host names so to avoid any
 capital letter is that I noticed that, even if my host names were a
 mixture of normal and capital letters, in the hb_gui they were shown
 without capitals.
 
 As soon as I will have time for this, I will do some further test to
 verify if I can duplicate this again starting from scratch, so to
 verify if Heartbeat 2.1.2 and/or EVMS 2.5.5.-24.52 really have some
 issues with node names partially capitalized, I will update the list
 afterwards.
 
 Could also be that I modified something else in the system that I'm
 not fully aware of, or I simply or forgot it, as I did many different
 test on the same boxes.
 
 Thanks again,
   Chris
 
 
 
 
 On Nov 21, 2007 9:23 PM, Yan Fitterer [EMAIL PROTECTED] wrote:
  Andrew Beekhof wrote:
  
   On Nov 21, 2007, at 10:11 AM, Christian Zemella wrote:
  
   Hi All,
  Anybody out there managed to have EVMS container resources
   properly failing over in a 2 node Heartbeat 2 cluster running on SLES
   10 SP1 ?
  
   I believe so... have you read the documentation below?
  http://wiki.novell.com/images/3/37/Exploring_HASF.pdf
  
  
  
   In my lab I can only start and stop the resource on the node that has
   the container assigned within evms, while if I shut down that node,
   the fail over does not occur as the evms_failover resource goes in
   time out; as soon as the other nodes comes up again it takes the
   resource back properly.
 
  This would indicate that evms_failover RA cannot assign the container to
  the new node. Do you see the resource failing? Have you checked
  failcount for the resources on that node?
 
  Some clues (from evms perspective): take a look in /dev/evms/.nodes
  When the private container is present on the node, a device file named
  after the container should appear there.
 
  TO test manually, the easiest is to start HB, then put both nodes on
  standby, then manipulate the evms devices manually.
 
  To deport the container (on resource stop) evms_failover issues commands
  to the evms command line tool:
 
  modify:$1,type=deported
  save
  exit
 
  where $1 is the value of the 1 parameter you've passed to evms_failover.
 
  You can try this yourself manually, to verify where the issue is (i.e.
  with evms or elsewhere).
 
  To import the container (when starting the resource), evms_failover does:
 
  modify:$1,node=$HOSTNAME,type=private
  save
  exit
 
 
 
  
   In my environment I created the following:
  
   I'm working using 2 VMWare boxes sharing one 4GB plain disk that works
   as SAN;
  
   EVMS:
  
   I created a private container (gwcont) on the shared disk using CSM
   plug-in and in it an EVMS Volume (gwvol);
   on the volume i make a reiserfs file system;
   I verified that the HA plug-in was working and that the node assigned
   to the container can manually mount it.
  
   HB_GUI:
  
   I created a group ordered and collocated;
   Inside the group i created the following resources:
   - evmsSCC -- no No attributes, No Parameters;
   - evms_failover -- Parameter: 1 Value: gwcont (name of the EVMS
   container )
   - Filesystem -- Parameter: fstype Value: reiserfs; Parameter: device
   Value: 

Re: [Linux-HA] evms-failover resource agent does not handle case sensitive hostnames correctly

2007-11-30 Thread Fabian Herschel
Am Freitag, den 30.11.2007, 12:19 + schrieb Yan Fitterer:
  First question: Are you interested in my patch (just 2-3 lines)?
 
 Most likely ;) Although I'm not completely sure how case is handled
 elsewhere. We might be case-sensitive on purpose! (although I can't see
 a good reason to do this for host names). 

Heartbeat handels case-sentive hostnames by lowercasing them. Thus
the CCM only lists lowercased hostnames (see crm_mon, gui and others).
But the nodes in ha.cf should written like uname -n responds (which
is in oroginal letters thus with case.

The problem using the evms-failover resource agent is that the hostname
is given to evms in original (case sensitive) letters, evms check the
string case sensitive (here is the original error I guess) against the
CCM entries and claims, the parameter is illeal (evms means the hostname
is not member of teh cluster, wile heartbeat says it is member of the
cluster (just different string comparing).

My patch just ignores upper/lowercases by lowercasing the local
hostname. This seams to be compatible with the way heartbeat is doing
it. And it is (fow now) much more easy to handle (for me) than to change
the evms behaviour.

Hope the patch is helpful
Fabian



 Maybe send to the -dev list?

Sorry I am not assigned to the dev list. So first I send the (very
small) patch here. See attachment (only three line patch).

 
  Second: Any idea why I am not able to migrate the private container?
  Are there any typical pitfalls?
 
 The resource cannot run anywhere is nothing to do with the resource
 agent. It's the PE (Policy Engine) deciding that it is so. Likely you
 have either resource node failcounts that are too high, or failed
 starts. Resources that have failed to start on a node are not eligible
 to be started again on that node (at least on the SLES 10 2.0.8
 version). I've heard this may change one day.
 To see failed starts, try crm_verify -VV
 To see failcounts, I usually grep the output of cibadmin -Q (much
 faster than issuing multiple crm_failcount commands...).
 
 Yan
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
-- 
SUSE LINUX GmbH,
Maxfeldstr. 5, D - 90409 Nürnberg
Phone:  +49 (0)69  - 2174-1923
FaxFFM: +49 (0)69  - 2174-1740
FaxDUS: +49 (0)211 - 5631-3769
e-mail: [EMAIL PROTECTED]

-

SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg)

-

PLEASE NOTE:  This e-mail may contain confidential and privileged
material for the sole use of the intended recipient.  Any review,
distribution or other use by anyone else is strictly prohibited.  If you
are not an intended recipient, please contact the sender and delete all
copies.  Thank you.

60d58
   HN=$(echo $HOSTNAME|tr [:upper:] [:lower:])
62c60
   modify:$1,node=$HN,type=private
---
   modify:$1,node=$HOSTNAME,type=private
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] evms-failover resource agent does not handle case sensitive hostnames correctly

2007-11-30 Thread Fabian Herschel
Hi,

I have searched for some times, why my defined evms-failover resource
does not work on any of my heartbeat 2.0.8 nodes (SLES10SP1, x86_64).

I want to use a private evms container to avoid the (noc cluster) file
system to be mounted twice by administrative error.

But the resource was not started on any node. So I checked, what the
re source agent have to do to start an private container resource and
tried to do that by hand using the CLI.

The CLI everytime told me a parameter was wrong (but not any useful
information which parameter). Then I tried to aquire the private
container but used the lower cased hostname (not the hostname sean
running uname -n) and it worked. 

I wrote a small patch for that resource agent and than teh resource
could be started on one cluster side but could not be migrated. The
cluster says the resource xxx could not run everywhere.

First question: Are you interested in my patch (just 2-3 lines)?
Second: Any idea why I am not able to migrate the private container?
Are there any typical pitfalls?

Best regards
Fabian 
-- 
SUSE LINUX GmbH,
Maxfeldstr. 5, D - 90409 Nürnberg
Phone:  +49 (0)69  - 2174-1923
FaxFFM: +49 (0)69  - 2174-1740
FaxDUS: +49 (0)211 - 5631-3769
e-mail: [EMAIL PROTECTED]

-

SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg)

-

PLEASE NOTE:  This e-mail may contain confidential and privileged
material for the sole use of the intended recipient.  Any review,
distribution or other use by anyone else is strictly prohibited.  If you
are not an intended recipient, please contact the sender and delete all
copies.  Thank you.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resource fencing

2007-10-24 Thread Fabian Herschel
Junko IKEDA schrieb:
 Is there any disk reservation strategy implemented in heartbeat and its
 agents (did
 not found any).
   
 i think someone from NTT posted a resource agent that did this
 

 Hi Fabian,

 NTT's RA is not a scsi reservation to be exact,
 but try the attached if you don't mind.
 We've upgraded it just a bit.
   
Thanks a lot for providing this agent!
 Please let me know if there are any troubles when you set it up.

 Best Regards,
 Junko Ikeda

 NTT DATA INTELLILINK CORPORATION
   
 

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


-- 
SUSE LINUX GmbH,
Maxfeldstr. 5, D - 90409 Nürnberg
Phone:  +49 (0)69  - 2174-1923
FaxFFM: +49 (0)69  - 2174-1740
FaxDUS: +49 (0)211 - 5631-3769 
e-mail: [EMAIL PROTECTED]

-

SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg)

-

PLEASE NOTE:  This e-mail may contain confidential and privileged
material for the sole use of the intended recipient.  Any review,
distribution or other use by anyone else is strictly prohibited.  If you
are not an intended recipient, please contact the sender and delete all
copies.  Thank you.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pingd, quorum, split-brain... should I give up?

2007-10-23 Thread Fabian Herschel
Riccardo Perni schrieb:

 Andrew Beekhof [EMAIL PROTECTED] ha scritto:

 On 10/23/07, Riccardo Perni [EMAIL PROTECTED] wrote:


 Andrew Beekhof [EMAIL PROTECTED] ha scritto:

  On 10/22/07, Riccardo Perni [EMAIL PROTECTED] wrote:
   Is it possible
   to handle this situation?
  
   You may try quorumd. See
  
   http://www.linux-ha.org/QuorumServerGuide
 
  I'm going to look at it, but is'n it another SPOF?
 
  by definition, no.
  because you've already had at least one failure before quorumd
  becomes relevant

 Do you mean that the cluster will continue to work even if I have a
 failure on the quorum server?

 my understanding is that the quorum server is not used unless you
 already dont have quorum... at which point you've lost half your nodes
 anyway

 Uhm, but at this point I already have a split-brain condition... or not?
No split brain means you have (at least) two cluster sides which both
means to be
THE cluster. The quorum server helps here. Only one side of the cluster gets
the quorum.

 --Riccardo Perni
 Unità Operativa Informatica Aziendale
 ASL Roma-B





 
 This message was sent using IMP, the Internet Messaging Program.

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


-- 
SUSE LINUX GmbH,
Maxfeldstr. 5, D - 90409 Nürnberg
Phone:  +49 (0)69  - 2174-1923
FaxFFM: +49 (0)69  - 2174-1740
FaxDUS: +49 (0)211 - 5631-3769 
e-mail: [EMAIL PROTECTED]

-

SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg)

-

PLEASE NOTE:  This e-mail may contain confidential and privileged
material for the sole use of the intended recipient.  Any review,
distribution or other use by anyone else is strictly prohibited.  If you
are not an intended recipient, please contact the sender and delete all
copies.  Thank you.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Resource fencing

2007-10-23 Thread Fabian Herschel
In the wiki I found the keyword resource fencing and also disk
reservation
in clusters like the symantec (veritas) hasf they have implemented disk
rervations.

Disk reservations can be implemneted by a specia SCSI-3 command sequence.

Is there any disk reservation strategy implemented in heartbeat and its
agents (did
not found any).

Regards
Fabian
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems