[EMAIL PROTECTED] wrote:
Send Linux-HA mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.linux-ha.org/mailman/listinfo/linux-ha
or, via email, send a message with subject or body 'help' to
        [EMAIL PROTECTED]

You can reach the person managing the list at
        [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-HA digest..."


Today's Topics:

   1. Re: heartbeat dying (Gary Schlachter)
   2. monitor mysql + prevent splitbrain (2node-cluster) (Lino Moragon)
   3. RE: Get resource location by C/C++ program (API) (Stephan Berlet)
   4. Re: monitor mysql + prevent splitbrain (2node-cluster)
      (Michael Brennen)


----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Jan 2008 11:58:17 -0500
From: Gary Schlachter <[EMAIL PROTECTED]>
Subject: Re: [Linux-HA] heartbeat dying
To: General Linux-HA mailing list <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dejan,

I started there. However, the problem I had was that I could not install 2.1.3 on Fedora Core 1 since it needed later versions of other RPMs. I can make 2.1.3 on FC1 but when I try to package heartbeat, I get missing libnet-devel, openhpi-devel, gnutls-devel, OpenIPMI-devel. Is there a way around this?

Gary

Dejan Muhamedagic wrote:
Hi,

On Fri, Jan 11, 2008 at 10:22:48AM -0500, Gary Schlachter wrote:
I have a problem with heartbeat dying. I have a 3 node cluster running HA 2.0.8 on Fedora Core 1. They are providing a single IP address resource. They are using eth0 as the heartbeat mechanism. If I disconnect the eth0 cable from the node which is providing the IP address, one of the other nodes correctly begins providing it. However, shortly after disconnecting the eth0 cable, the heartbeat process (and others) die. The
This has been fixed a few months ago. The fix is in the 2.1.3
release. Could you please use the new release.

Thanks,

Dejan

key area in the ha-debug log looks like the following:

pengine[4293]: 2008/01/11_09:50:22 info: determine_online_status: Node loneranger.us.big.net is online pengine[4293]: 2008/01/11_09:50:22 info: native_print: SharedIP (heartbeat::ocf:IPaddr): Started loneranger.us.big.net pengine[4293]: 2008/01/11_09:50:22 notice: StopRsc: loneranger.us.big.net Stop SharedIP crmd[9543]: 2008/01/11_09:50:22 info: do_state_transition: loneranger.us.big.net: State transition S_POLICY_ENGINE ->S_TRANSITION_ENGINE [input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] pengine[4293]: 2008/01/11_09:50:22 info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-137.bz2 tengine[4292]: 2008/01/11_09:50:22 info: unpack_graph: Unpacked transition 0: 1 actions in 1 synapses tengine[4292]: 2008/01/11_09:50:22 info: send_rsc_command: Initiating action 3: SharedIP_stop_0 on loneranger.us.big.net crmd[9543]: 2008/01/11_09:50:22 info: do_lrm_rsc_op: Performing op=SharedIP_stop_0 key=3:0:994066a9-4cae-49a4-abad-37f3e0b84b3e) IPaddr[4300]: 2008/01/11_09:50:22 INFO: /sbin/ifconfig eth0:0 10.1.2.50 down lrmd[9540]: 2008/01/11_09:50:22 info: RA output: (SharedIP:stop:stderr) SIOCDELRT: No such process

crmd[9543]: 2008/01/11_09:50:22 info: process_lrm_event: LRM operation SharedIP_stop_0 (call=4, rc=0) complete cib[9539]: 2008/01/11_09:50:22 info: cib_diff_notify: Update (client: 9543, call:32): 0.30.317 -> 0.30.318 (ok) cib[4315]: 2008/01/11_09:50:22 info: write_cib_contents: Wrote version 0.30.318 of the CIB to disk (digest: ad7329b3cddc6a9bbd96deb332a3d08f) tengine[4292]: 2008/01/11_09:50:22 info: te_update_diff: Processing diff (cib_update): 0.30.317 -> 0.30.318 tengine[4292]: 2008/01/11_09:50:22 info: match_graph_event: Action SharedIP_stop_0 (3) confirmed on c8608d41-66b2-4115-9043-4a8423b0d562 tengine[4292]: 2008/01/11_09:50:22 info: run_graph: Transition 0: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0) tengine[4292]: 2008/01/11_09:50:22 info: notify_crmd: Transition 0 status: te_complete - <null> crmd[9543]: 2008/01/11_09:50:22 info: do_state_transition: loneranger.us.big.net: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.

The last messages repeat for a very long time then most daemons eventually stop.


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



------------------------------

Message: 2
Date: Mon, 14 Jan 2008 19:33:15 +0100
From: Lino Moragon <[EMAIL PROTECTED]>
Subject: [Linux-HA] monitor mysql + prevent splitbrain (2node-cluster)
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi list,

I've got 2 questions concerning the prevention of splitbrain and monitoring MySQL Server 5. I'm testing a MySQL Server 5 with 3 instances on a CentOS 5.1 with Heartbeat and DRBD on a 2 Node Cluster (active / passive)
At the moment my 2 Nodes are running on a VMware Server.

I use the following Versions:
heartbeat v. 2.0.8-1
DRBD v. 8.0.6
For heartbeat style I'm using Release 1.

I've configured on each 2 NICs, 1 for DRBD sync and heartbeat and another one for heartbeat.

haresoures:
mysql1 drbddisk::r0 Filesystem::/dev/drbd0::/pool/mysql/::ext3 172.16.100.110 mysqld_multi

My Questions:
1. If I unplug both NICs of the active Node, I get a Splitbrain after I reconnect them again. Is there any solution to prevent this using heartbeat R1 or which possibilities would I have with R2?

2. How can I tell heartbeat to make an automatic failover to my passive node if any of my MySQL Process has a hangup or terminates? Can you monitor these processes and in cause of failure provoke an automatic failover? If yes, which tools would I have to use?

I digged around the linux-ha site and other mailing-list articles but so far unsuccessful.
Has anyone had this combination yet?

I'd be very thankful for any ideas / suggestions.


Lino



------------------------------

Message: 3
Date: Mon, 14 Jan 2008 19:40:36 +0100
From: "Stephan Berlet" <[EMAIL PROTECTED]>
Subject: RE: [Linux-HA] Get resource location by C/C++ program (API)
To: "'General Linux-HA mailing list'" <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain;charset="iso-8859-1"

Hello again,
I've worked at this things. I'm not finished yet, but now I
know a couple of things better.

On Jan 10, 2008, at 0:23 PM, Andrew Beekhof wrote:
On Jan 9, 2008, at 7:30 PM, Stephan Berlet wrote:
When I try to compile crm_mon.c, the compiler moans that he can't
find the headers "lha_internal.h" and "lib/crm/pengine/unpack.h"
crm_mon.c can only be built from within the project
in particular, the name of the first header should tell you something about who should be including it :-) you're better off starting from scratch and copying in only what you need

That is what I've done in the meantime.

Both files don't exist in my filesystem. (I'm searched them by
using 'locate'). Is it because I installed heartbeat with rpms?
right, they're both internal files which are not installed. you shouldn't be using them.

I've simply omitted these two files, and I hope it works anyway.


Another problem for me is that there are some conflicts with C++
keywords
someone had a nice solution for this previously on the mailing list.
i forget the details but google should be able to help

That solution works fine, here my code therefore:

#ifdef __cplusplus
extern "C" {
# define delete __fake_delete
# define private __fake_private
# define new __fake_new
# define class __fake_class
// Add other defines for any conflicting C++ keyword
#endif
/*** include heartbeat headers here ***/
#ifdef __cplusplus
}
#endif


and invalid transformations (e.g. void* to resource_t*)
Is it possible to make the macro "slist_iter" C++ compliant?
probably
but not being a c++ guy i'd not know how. i'm happy to take patches though...

I worked out a solution for this thing, too. Just modify one line
in the definition from slist_iter:
(more precisely the line 196 in /include/crm/crm.h, version 2.1.2-3)
--      child = __crm_iter_head->data;   \
++      child = (child_type *) __crm_iter_head->data;   \

That works for my purposes.
Similar changes for the xml_child_iter macro in xml.h


On Jan 8, 2008, at 3:20 AM, Andrew Beekhof wrote:
On Jan 7, 2008, at 2:54 PM, Stephan Berlet wrote:

Hello,

First of all I want to excuse me for my bad english!

We use heartbeat 2.1.2-3 in a 2 node cluster, just to manage the
virtual
IP adress 172.30.4.170. We have a network service that have to run
at both nodes to make sure they have a synchronous data set.
Therefore both nodes have to know which one holds the virtual IP.
I would like to implement that with the heartbeat API.
If you're using the crm, then the correct API to use is from the
Policy Engine.
For an example, check out the source code for crm_mon.


Maybe I will report my final results with this subject,
or I will ask you many more questions ;)

Best regards and many thanks,
Stephan
HELPING HEADS for Hard- and Software
-------------------------------------------------------------------------
Für Ihre Projekte entwickeln wir maßgeschneiderte Lösungen - schnell,
flexibel und direkt vor Ort. Unser eingespieltes Team an erfahrenen Hard-
und Software-Spezialisten unterstützt Sie dort, wo Sie uns brauchen.



--------------------------------------------------------------------------
SysDesign GmbH
Säntisstrasse 25
D-88079 Kressbronn am Bodensee

Geschäftsführer: Franz Kleiner, Achim Solle
Handelsregister: Ulm 632138
--------------------------------------------------------------------------



------------------------------

Message: 4
Date: Mon, 14 Jan 2008 12:47:07 -0600
From: Michael Brennen <[EMAIL PROTECTED]>
Subject: Re: [Linux-HA] monitor mysql + prevent splitbrain
        (2node-cluster)
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset="iso-8859-1"

On Monday 14 January 2008 12:33, Lino Moragon wrote:
Hi list,

I've got 2 questions concerning the prevention of splitbrain and
monitoring MySQL Server 5.
I'm testing a MySQL Server 5 with 3 instances on a CentOS 5.1 with
Heartbeat and DRBD on a 2 Node Cluster (active / passive)
At the moment my 2 Nodes are running on a VMware Server.

I use the following Versions:
heartbeat v. 2.0.8-1
DRBD v. 8.0.6
For heartbeat style I'm using Release 1.

I've configured on each 2 NICs, 1 for DRBD sync and heartbeat and
another one for heartbeat.

haresoures:
mysql1  drbddisk::r0 Filesystem::/dev/drbd0::/pool/mysql/::ext3
172.16.100.110 mysqld_multi

My Questions:
1. If I unplug both NICs of the active Node,  I get a Splitbrain after I
reconnect them again.
Is there any solution to prevent this using heartbeat R1 or which
possibilities would I have with R2?

That sounds normal, as both machines then think they can become primary.

Do you have a fence mechanism in place so the secondary can forcibly take the former primary out of service?

2. How can I tell heartbeat to make an automatic failover to my passive
node if any of my MySQL Process has a hangup or terminates?
Can you monitor these processes and in cause of failure provoke an
automatic failover? If yes, which tools would I have to use?

That I'm not sure, I will be awaiting the answer myself. :)
1. Thx for your answer. No i haven't implemented fencing yet. What would make more sense: fencing via DRBD or heartbeat?
  If heartbeat, is there any possibility to do it without any additional hw 
such as external apcswitch for e.g. STONITH?
  Are there other solutions?

2. I heard about implementing a watchdog timer that could do the monitoring but 
that wouldn't be managed by heartbeat.
  It would cause a machine reset. But if heartbeat yet provides anything 
similar that would be great...

Lino

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to