Hi, I'm sure that I replied to this one, but...
On Thu, Dec 06, 2007 at 09:32:20AM +0100, Andrew Beekhof wrote: > > On Nov 30, 2007, at 7:39 PM, Art Age Software wrote: > >> Hi, >> >> I'm setting up my first heartbeat cluster. (I have managed one in the >> past, but never set one up from scratch before.) It is going well, but >> I have a few questions: >> >> 1) In the log, the following sometimes appears during initial >> heartbeat startup, and I have no idea what it means: >> >> heartbeat: [4502]: ERROR: ha_msg_addraw_ll: illegal field >> heartbeat: [4502]: ERROR: ha_msg_addraw(): ha_msg_addraw_ll failed >> heartbeat: [4502]: ERROR: NV failure (string2msg_ll): >> heartbeat: [4502]: ERROR: Input string: [>>> t=NS_ackmsg >>> t=status >> st=up dt=7d00 protocol=1 src=db1 (1)srcuuid=+yf5W+NTRWi9QYzh4ZzsPg== >> seq=5 hg=474f3bee ts=475050f3 ld=0.59 0.15 0.05 2/148 4958 ttl=4 >> auth=1 <<< ] >> heartbeat: [4502]: ERROR: sp=>>> t=status st=up dt=7d00 protocol=1 >> src=db1 (1)srcuuid=+yf5W+NTRWi9QYzh4ZzsPg== seq=5 hg=474f3bee >> ts=475050f3 ld=0.59 0.15 0.05 2/148 4958 ttl=4 auth=1 <<< >> heartbeat: [4502]: ERROR: depth=0 >> heartbeat: [4502]: ERROR: MSG: Dumping message with 1 fields >> heartbeat: [4502]: ERROR: MSG[0] : [t=NS_ackmsg] > > > that doesn't look good at all > what version are you running? > if its a recent one, i'd recommend reporting a bug Recently somebody had a same problem which turned out to be a ha.cf setting (setting message format to netstring). Anyway, it is a communication problem: channel not clear (serial?) or similar. Please post ha.cf. >> 2) In the log, the broadcast port appears to be opened and then >> immediately closed. Does this mean the port was not initialized >> successfully? >> >> heartbeat: [4502]: info: glib: UDP Broadcast heartbeat started on port >> 694 (694) interface >> heartbeat: [4502]: info: glib: UDP Broadcast heartbeat closed on port >> 694 interface - Status: 1 >> > > dont know, sorry No, this is OK. >> 3) I have defined a ping_group with 2 ping nodes using ipfail. If the >> active cluster nodes can only see one of the ping nodes, and the >> backup cluster node can see both ping nodes, then heartbeat initiates >> a failover to the backup node. Is this correct behavior? According to >> the docs, "The ability to communicate with any of the group members >> means that the group-name member is reachable." I interpreted this to >> mean that as long as one ping node in the group is active, the cluster >> would be considered stable. But in fact, heartbeat seems to favor the >> node with "better connectivity." > > i'm not familiar with how ipfail works ipfail is v1. Further down you mention constraints. Which config style do you run: v1 or v2? >> 4) Is there a way to make a resource run on one and only one node (and >> not failover if the node goes down)? I want to set up constraints such >> that: >> >> (i) Resource "A" favors node "1" but can run on node "2" if necessary. >> (ii) Resource "B" can only run on node "2" >> (iii) Resource "A" and "B" may **not** run on the same node, and >> resource "A" has priority. So, if node "1" goes down, resource ""B" >> will be stopped and resource "A" will migrate to node "2". >> >> Any way to accomplish that? > > as far as i know, you need version 2 with the crm enabled to do that Right, v1 won't do. With v2 it's: - assign higher scores to A compared to B - colocate A and B with -INFINITY Thanks, Dejan >> >> >> Thanks much in advance for any help. >> >> Sam >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
