On Tue, Mar 02, 2010 at 03:00:59AM +0200, Marian Marinov wrote:
> +calculate_preference() {
> +     master_file=''
> +     master_pos=''
> +     mysql --password=$OCF_RESKEY_replication_passwd \
> +             --user=$OCF_RESKEY_replication_user \
> +             -h $master_host \
> +             -O connect_timeout=1 \
> +             -e 'SHOW MASTER STATUS\G'|while read k v; do
> +             if [ "$k" = 'File:' ]; then master_file="$v"; fi
> +             if [ "$k" = 'Position:' ]; then master_pos="$v"; fi
> +     done
> +
> +     if [ -z "$master_file" ] || [ -z "$master_pos" ]; then
> +             ocf_log err "unable to find master file or master position"
> +             return $OCF_ERR_GENERIC;
> +     fi

Great.
So now we have a master_file and a master_pos.

> +     local_log=''
> +     local_pos=''
> +     exec_pos=''


And right _now_, on the master,
the logs may be rotated.

And if the slave was particularly fast,
it probably already caught up,
before you ask it for its status:

> +     mysql --socket=$OCF_RESKEY_socket -O connect_timeout=1 \
> +             -e 'SHOW SLAVE STATUS\G'|grep Master_Log|while read k v; do
> +             if [ "$k" = 'Master_Log_File:' ]; then local_log="$v"; fi
> +             if [ "$k" = 'Read_Master_Log_Pos:' ]; then local_pos="$v"; fi
> +             if [ "$k" = 'Exec_Master_Log_Pos:' ]; then exec_pos="$v"; fi
> +     done
> +
> +     if [ -z "$local_log" ] || [ -z "$local_pos" ] || [ -z "$exec_pos" ]; 
> then
> +             ocf_log err "unable to get slave status values"
> +             return $OCF_ERR_GENERIC
> +     fi
> +
> +     # no preference for very old slaves
> +     if [ "$master_file" != "$local_log" ]; then

In which case this will no longer be true.
And your best slave will suddenly have a master preference of zero.
Bad timing, huh?

> +             do_cmd $CRM_MASTER -v 0
> +             return 1;
> +     fi
> +
> +     # calculate preference points depending on how far behind the master 
> are we
> +     let local_diff=master_pos-local_pos

Same here.  If there is a lot of stuff going on,
local_pos may be larger than master_pos already,
even if they stay in the same file,
because they cannot be sampled "atomically".
You'd get a negative diff, and not
assign proper preferences below.

Just food for thought...

I'd suggest to sample the slave first, then the master.
That way you can be sure your idea of the slave position
won't be ahead of your idea of the master position.

Depending on the exact scheme of file names,
you can detect file rotations or other wrap arounds as well.

> +     if [ "$local_diff" -ge 1000 ]; then
> +         points=0;

Sure that once you are 1000 "behind",
you do not want to be master anymore?
Is that generic, should that be configurable?

If you made sure local_diff cannot be negative
as per my suggestion above, you could do

: ${m_ceiling:=9876}  # <=- could become a RA parameter
points=$(( ld > m_ceiling ? 0 : m_ceiling - ld ))

and in case you wanted to dampen the bouncing around
because of various sampling times, you could still
integer divide by 10 or some such.



-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to