[Linux-HA] Re: Re: Linux-HA Digest, Vol 50, Issue 44

Lino Moragon Mon, 14 Jan 2008 12:12:16 -0800

[EMAIL PROTECTED] wrote:

Send Linux-HA mailing list submissions to
        [email protected]
To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.linux-ha.org/mailman/listinfo/linux-ha
or, via email, send a message with subject or body 'help' to
        [EMAIL PROTECTED]

You can reach the person managing the list at
        [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-HA digest..."


Today's Topics:

   1. Re: heartbeat dying (Gary Schlachter)
   2. monitor mysql + prevent splitbrain (2node-cluster) (Lino Moragon)
   3. RE: Get resource location by C/C++ program (API) (Stephan Berlet)
   4. Re: monitor mysql + prevent splitbrain (2node-cluster)
      (Michael Brennen)


----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Jan 2008 11:58:17 -0500
From: Gary Schlachter <[EMAIL PROTECTED]>
Subject: Re: [Linux-HA] heartbeat dying
To: General Linux-HA mailing list <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dejan,
I started there. However, the problem I had was that I could notinstall 2.1.3 on Fedora Core 1 since it needed later versions of otherRPMs. I can make 2.1.3 on FC1 but when I try to package heartbeat, Iget missing libnet-devel, openhpi-devel, gnutls-devel, OpenIPMI-devel.Is there a way around this?
Gary

Dejan Muhamedagic wrote:
Hi,

On Fri, Jan 11, 2008 at 10:22:48AM -0500, Gary Schlachter wrote:
I have a problem with heartbeat dying. I have a 3 node cluster runningHA 2.0.8 on Fedora Core 1. They are providing a single IP addressresource. They are using eth0 as the heartbeat mechanism. If I disconnectthe eth0 cable from the node which is providing the IP address, one of theother nodes correctly begins providing it. However, shortly afterdisconnecting the eth0 cable, the heartbeat process (and others) die. The
This has been fixed a few months ago. The fix is in the 2.1.3
release. Could you please use the new release.

Thanks,

Dejan
key area in the ha-debug log looks like the following:
pengine[4293]: 2008/01/11_09:50:22 info: determine_online_status: Nodeloneranger.us.big.net is onlinepengine[4293]: 2008/01/11_09:50:22 info: native_print: SharedIP(heartbeat::ocf:IPaddr): Started loneranger.us.big.netpengine[4293]: 2008/01/11_09:50:22 notice: StopRsc: loneranger.us.big.netStop SharedIPcrmd[9543]: 2008/01/11_09:50:22 info: do_state_transition:loneranger.us.big.net: State transition S_POLICY_ENGINE->S_TRANSITION_ENGINE [input=I_PE_SUCCESS cause=C_IPC_MESSAGEorigin=route_message ]pengine[4293]: 2008/01/11_09:50:22 info: process_pe_message: Transition 0:PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-137.bz2tengine[4292]: 2008/01/11_09:50:22 info: unpack_graph: Unpacked transition0: 1 actions in 1 synapsestengine[4292]: 2008/01/11_09:50:22 info: send_rsc_command: Initiatingaction 3: SharedIP_stop_0 on loneranger.us.big.netcrmd[9543]: 2008/01/11_09:50:22 info: do_lrm_rsc_op: Performingop=SharedIP_stop_0 key=3:0:994066a9-4cae-49a4-abad-37f3e0b84b3e)IPaddr[4300]: 2008/01/11_09:50:22 INFO: /sbin/ifconfig eth0:0 10.1.2.50downlrmd[9540]: 2008/01/11_09:50:22 info: RA output: (SharedIP:stop:stderr)SIOCDELRT: No such process
crmd[9543]: 2008/01/11_09:50:22 info: process_lrm_event: LRM operationSharedIP_stop_0 (call=4, rc=0) completecib[9539]: 2008/01/11_09:50:22 info: cib_diff_notify: Update (client: 9543,call:32): 0.30.317 -> 0.30.318 (ok)cib[4315]: 2008/01/11_09:50:22 info: write_cib_contents: Wrote version0.30.318 of the CIB to disk (digest: ad7329b3cddc6a9bbd96deb332a3d08f)tengine[4292]: 2008/01/11_09:50:22 info: te_update_diff: Processing diff(cib_update): 0.30.317 -> 0.30.318tengine[4292]: 2008/01/11_09:50:22 info: match_graph_event: ActionSharedIP_stop_0 (3) confirmed on c8608d41-66b2-4115-9043-4a8423b0d562tengine[4292]: 2008/01/11_09:50:22 info: run_graph: Transition 0:(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)tengine[4292]: 2008/01/11_09:50:22 info: notify_crmd: Transition 0 status:te_complete - <null>crmd[9543]: 2008/01/11_09:50:22 info: do_state_transition:loneranger.us.big.net: State transition S_TRANSITION_ENGINE -> S_IDLE [input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0:Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0:Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe 0:Resource temporarily unavailable
heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.
The last messages repeat for a very long time then most daemons eventuallystop.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
------------------------------

Message: 2
Date: Mon, 14 Jan 2008 19:33:15 +0100
From: Lino Moragon <[EMAIL PROTECTED]>
Subject: [Linux-HA] monitor mysql + prevent splitbrain (2node-cluster)
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi list,
I've got 2 questions concerning the prevention of splitbrain andmonitoring MySQL Server 5.I'm testing a MySQL Server 5 with 3 instances on a CentOS 5.1 withHeartbeat and DRBD on a 2 Node Cluster (active / passive)
At the moment my 2 Nodes are running on a VMware Server.

I use the following Versions:
heartbeat v. 2.0.8-1
DRBD v. 8.0.6
For heartbeat style I'm using Release 1.
I've configured on each 2 NICs, 1 for DRBD sync and heartbeat andanother one for heartbeat.
haresoures:
mysql1 drbddisk::r0 Filesystem::/dev/drbd0::/pool/mysql/::ext3172.16.100.110 mysqld_multi
My Questions:
1. If I unplug both NICs of the active Node, I get a Splitbrain after Ireconnect them again.Is there any solution to prevent this using heartbeat R1 or whichpossibilities would I have with R2?
2. How can I tell heartbeat to make an automatic failover to my passivenode if any of my MySQL Process has a hangup or terminates?Can you monitor these processes and in cause of failure provoke anautomatic failover? If yes, which tools would I have to use?
I digged around the linux-ha site and other mailing-list articles but sofar unsuccessful.
Has anyone had this combination yet?

I'd be very thankful for any ideas / suggestions.


Lino



------------------------------

Message: 3
Date: Mon, 14 Jan 2008 19:40:36 +0100
From: "Stephan Berlet" <[EMAIL PROTECTED]>
Subject: RE: [Linux-HA] Get resource location by C/C++ program (API)
To: "'General Linux-HA mailing list'" <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain;charset="iso-8859-1"
Hello again,
I've worked at this things. I'm not finished yet, but now I
know a couple of things better.

On Jan 10, 2008, at 0:23 PM, Andrew Beekhof wrote:
On Jan 9, 2008, at 7:30 PM, Stephan Berlet wrote:
When I try to compile crm_mon.c, the compiler moans that he can't
find the headers "lha_internal.h" and "lib/crm/pengine/unpack.h"
crm_mon.c can only be built from within the project
in particular, the name of the first header should tell you somethingabout who should be including it :-)you're better off starting from scratch and copying in only what youneed
That is what I've done in the meantime.
Both files don't exist in my filesystem. (I'm searched them by
using 'locate'). Is it because I installed heartbeat with rpms?
right, they're both internal files which are not installed. youshouldn't be using them.
I've simply omitted these two files, and I hope it works anyway.
Another problem for me is that there are some conflicts with C++
keywords
someone had a nice solution for this previously on the mailing list.
i forget the details but google should be able to help
That solution works fine, here my code therefore:

#ifdef __cplusplus
extern "C" {
# define delete __fake_delete
# define private __fake_private
# define new __fake_new
# define class __fake_class
// Add other defines for any conflicting C++ keyword
#endif
/*** include heartbeat headers here ***/
#ifdef __cplusplus
}
#endif
and invalid transformations (e.g. void* to resource_t*)
Is it possible to make the macro "slist_iter" C++ compliant?
probably
but not being a c++ guy i'd not know how. i'm happy to take patchesthough...
I worked out a solution for this thing, too. Just modify one line
in the definition from slist_iter:
(more precisely the line 196 in /include/crm/crm.h, version 2.1.2-3)
--      child = __crm_iter_head->data;   \
++      child = (child_type *) __crm_iter_head->data;   \

That works for my purposes.
Similar changes for the xml_child_iter macro in xml.h
On Jan 8, 2008, at 3:20 AM, Andrew Beekhof wrote:
On Jan 7, 2008, at 2:54 PM, Stephan Berlet wrote:
Hello,

First of all I want to excuse me for my bad english!

We use heartbeat 2.1.2-3 in a 2 node cluster, just to manage the
virtual
IP adress 172.30.4.170. We have a network service that have to run
at both nodes to make sure they have a synchronous data set.
Therefore both nodes have to know which one holds the virtual IP.
I would like to implement that with the heartbeat API.
If you're using the crm, then the correct API to use is from the
Policy Engine.
For an example, check out the source code for crm_mon.
Maybe I will report my final results with this subject,
or I will ask you many more questions ;)

Best regards and many thanks,
Stephan
HELPING HEADS for Hard- and Software
-------------------------------------------------------------------------
Für Ihre Projekte entwickeln wir maßgeschneiderte Lösungen - schnell,
flexibel und direkt vor Ort. Unser eingespieltes Team an erfahrenen Hard-
und Software-Spezialisten unterstützt Sie dort, wo Sie uns brauchen.



--------------------------------------------------------------------------
SysDesign GmbH
Säntisstrasse 25
D-88079 Kressbronn am Bodensee

Geschäftsführer: Franz Kleiner, Achim Solle
Handelsregister: Ulm 632138
--------------------------------------------------------------------------



------------------------------

Message: 4
Date: Mon, 14 Jan 2008 12:47:07 -0600
From: Michael Brennen <[EMAIL PROTECTED]>
Subject: Re: [Linux-HA] monitor mysql + prevent splitbrain
        (2node-cluster)
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset="iso-8859-1"

On Monday 14 January 2008 12:33, Lino Moragon wrote:
Hi list,

I've got 2 questions concerning the prevention of splitbrain and
monitoring MySQL Server 5.
I'm testing a MySQL Server 5 with 3 instances on a CentOS 5.1 with
Heartbeat and DRBD on a 2 Node Cluster (active / passive)
At the moment my 2 Nodes are running on a VMware Server.

I use the following Versions:
heartbeat v. 2.0.8-1
DRBD v. 8.0.6
For heartbeat style I'm using Release 1.

I've configured on each 2 NICs, 1 for DRBD sync and heartbeat and
another one for heartbeat.

haresoures:
mysql1  drbddisk::r0 Filesystem::/dev/drbd0::/pool/mysql/::ext3
172.16.100.110 mysqld_multi

My Questions:
1. If I unplug both NICs of the active Node,  I get a Splitbrain after I
reconnect them again.
Is there any solution to prevent this using heartbeat R1 or which
possibilities would I have with R2?
That sounds normal, as both machines then think they can become primary.
Do you have a fence mechanism in place so the secondary can forcibly take theformer primary out of service?
2. How can I tell heartbeat to make an automatic failover to my passive
node if any of my MySQL Process has a hangup or terminates?
Can you monitor these processes and in cause of failure provoke an
automatic failover? If yes, which tools would I have to use?
That I'm not sure, I will be awaiting the answer myself. :)

1. Thx for your answer.No i haven't implemented fencing yet. What would make more sense: fencing via DRBD or heartbeat?

  If heartbeat, is there any possibility to do it without any additional hw 
such as external apcswitch for e.g. STONITH?
  Are there other solutions?

2. I heard about implementing a watchdog timer that could do the monitoring but 
that wouldn't be managed by heartbeat.
  It would cause a machine reset. But if heartbeat yet provides anything 
similar that would be great...

Lino

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Re: Re: Linux-HA Digest, Vol 50, Issue 44

Reply via email to