On Wed, Nov 24, 2010 at 06:18:17PM +0200, Alexander Bodnarashik wrote:
> 
> On Nov 24, 2010, at 15:09, Dejan Muhamedagic wrote:
> 
> > Hi,
> > 
> > On Mon, Nov 22, 2010 at 05:48:19PM +0200, Alexander Bodnarashik wrote:
> >> Finally i've managed to build and run heartbeat/pacemaker on FreeBSD 8.1 
> >> host.
> > 
> > I guess that you pulled the yesterday's resource-agents with the
> > autoconf fix to exclude tickle if struct iphdr is not available.
> > Of course, BSD has a corresponding struct, but slightly
> > different.
> > 
> Yes. i've notice, that my "tickle-patch" failed to apply. i've excluded if 
> from script before sending to list.
> 
> >> Attached you'll find csh script (with few comments inside)
> >> which will automate installation on clean system (no error
> >> checking, sorry, just draft script to save some time and
> >> keypresses).
> >> Due to different paths to programs on Linux and FreeBSD i've
> >> made few ugly symlinks and applied few patches to fix Linux
> >> specific code (or FreeBSD specific :) )
> >> I'm pretty sure most of them are not required and it may be
> >> fixed by autotools, but i have zero-knowledge there.
> >> I haven't tested if cluster behaves properly yet.
> > 
> > It would be interesting to hear if it does.
> 
> Issues i've found so far (nothing critical though):
> 1. FreeBSD init-scripts are not lsb compatible, they make pacemaker crazy 
> upon failure :) (not a big problem as can use provided ocf resource 
> agents/write own)
> 2. It's better to have bash as default shell (really :) )
> 3. corosync based cluster requires /dev/shm ("ln -s /tmp /dev/shm" in 
> /etc/rc.local rocks ) (i guess it's not related to this list, just a note)
> 4.  heartbeat initscript (as well as ldirectord init script) breaks freebsd 
> conventions and ignores heartbeat_enable (ldirectord_enable) option in 
> rc.conf (to fix: update script, set local_startup="" etc)
> 5. pacemaker-controlled resources should not be autostarted by OS, from the 
> other hand FreeBSD initscripts refuse "start" command if not enabled in 
> rc.conf (e.g.: memcache_enable="yes"). basically same issue as (1)
> 6. I get a lot of errors like:
> /var/log/messages:Nov 23 19:45:30 alice heartbeat: [1023]: ERROR: glib: 
> mcast_write: Unable to send mcast packet [-1]: Message too long
> /var/log/messages:Nov 23 19:45:30 alice heartbeat: [1023]: ERROR: 
> write_child: write failure on mcast em0.: Message too long

This is a problem.
Heartbeat can only do udp (yet), and only one udp packet per message (no
message fragmentation over multiple udp packets yet),
and udp has a hard message limit of 64k (minus overhead).
The cib can quickly grow beyond that, and pacemaker 1.1 will hand down
messages of up to 128k uncompressed.

You should consider corosync.

If that does not work for you for whatever reason, your only option with
pacemaker 1.1 on heartbeat with any reasonably sized cib for now would
be to, in ha.cf (see below), enable compression, and set it to use
"traditional compression" (because the non-traditional, per field
compression does not work for FT_STRING fields, which pacemaker uses
most of the time for the "small" cib string representations < 128k).

Depending on your network setup, the udp message size limit may be much
smaller than 64k, I don't know the bsd tunables for this, have a look at
udp fragmentation and so on.

Once a node runs into this, it will likely either self-fence or be
fenced sooner or later.  But if this has occured once, it will occur
more frequently. So once you see this, prepare for the worst.

Without fencing, I have seen things where membership of cib and crm and
ccm diverge, and nodes thinking of one node as DC which itself does not
agree and similar byzantine stuff.


ha.cf settings to mitigate this problem a bit
(you can play with the threshold, unit is kb):
compression bz2
compression_threshold 30
traditional_compression yes

> and
> /var/log/messages:Nov 23 19:44:52 alice cib: [1083]: WARN: adjust rcvbuf size 
> to 1048576 failed: No buffer space available

That's not a direct problem, only indicates a failed performance
optimization.  It may or may not become a problem, depending on the how
busy the ipc layer gets.

> Could be not an issue, as i run freebsd guests on vmware-server and not real 
> hardware.
>
> As far as i can see, besides mentioned above (and some minor issues
> which i can't recall atm)  everything works fine, in short: failed
> node is fenced and resources are failed over.

Hope that helps a bit.

Cheers,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to