Hi, Horms-san, thank you for your check and reply.
> Hi, All
>
> I add the new function for heartbeat-2.0.8 and attached its patch file.
>
> The function is to apply the new timeout parameters ( keepalive, deadtime,
> deadping, warntime ) without stopping the heartbeat services. Currently
> heartbeat boot scripts supply the 'reload' or 'forcereload' function, but
> it, they are same, does stop the services and the HA services are moved to
> standby node, because its process kills the forked heartbeat processes and
> clients ( crmd etc. ).
> So, we think to without suspending the services make the changing parameters
> to apply to driving nodes. Current feature is following.
> 1. changing ha.cf file for 4 parameters
> 2. send working parent heartbeat process signal SIGRTMAX (e.g. kill -s
> SIGRTMAX `cat /var/run/heartbeat.pid` (Why do I choose SIGRTMAX? I do not
> find the unused good signal.)
>
> As we research the heatbeat, it may be safety. And I want to listen to your
> issues for patch and functions.
>
> Best Regards
> MATSUDA, Daiki
Hi Matsuda-san,
I've taken a look over your patch. Its very long, so I am not sure that
I understand it all. One thing that I am particularly puzzled over is
the purpose of maxdispatchdelay_array. Is this used to notify child
processes when an update occurs?
maxdispatchdelay_array is the table for memorying the behaviour of
G_main_setmaxdispatchdelay() and used to re-write the
information which is added by G_main_setmaxdispatchdelay() with timeout
values on initial booting. Similarly, setall_array and
timeout_add_full_array are for G_main_setall_id() and
Gmain_timeout_add_full().
Especially, timeout_addfull_array is important. Because in
Gmain_timeout_add_full() the loop interval is decided.
I've put a number of comments inline, mainly minor things relating
to formating of the code. Other than that my only queries are:
* Has it been tested?
Yes. But it may be not enough.
* If so, can you re-test it against the dev tree?
* If at all possible could you split the patch up into smaller bits.
Perhaps if you adopt some of my refactoring of existing code ideas
they could be separate patches?
OK, I will try them.
* Do you propose to add a target to this to the init script?
Yes, I will do next time.
As following is too large, I snip.
> +/* from lib/clplumbing/GSource.c */
> +struct GTimeoutAppend {
> + COMMON_STRUCTSTART;
> + longclock_t nexttime;
> + guint interval;
> +};
Should GTimeoutAppend be moved into a header file rather than duplicated?
Yes, its code is duplicated from the original. I think it is better to
include header file. I will modify.
> + for( i = 0; i < maxdispatchdelay_array->len; i++ ){
> + MaxDispatchDelay_Array_t *carray = g_ptr_array_index(
maxdispatchdelay_array, i );
> +
> + if( carray->type == timeout_type ){
> + if( carray->mode == '*' ){
> + G_main_setmaxdispatchdelay( ( GSource *
)carray->addr, value * carray->value );
> + } else if( carray->mode == '/' ){
> + G_main_setmaxdispatchdelay( ( GSource *
)carray->addr, value / carray->value );
> + }
> + }
> + }
> +}
Could you reformat the code above a little bit so that it is <= 80col wide?
As I have already re-write, next time send with other modification.
> +/* care for Gmain_setall_id() function */
> +void
> +modify_setall_id_value( int timeout_type )
> +{
> + int i;
> + long value;
> +
> + switch( timeout_type ){
> + case 0:
> + value = config->heartbeat_ms;
> + break;
> + case 1:
> + value = config->deadtime_ms;
> + break;
> + case 2:
> + value = config->warntime_ms;
> + break;
> + default:
> + return;
> + }
The logic above seems to be a duplicate of the first half of
modify_setmaxdispatchdelay_value(). Is there any value in
breaking it out into its own (static?) function?
It is duplicated code, so I will modify.
for( i = 0; i < setall_array->len; i++ ){
SetAll_Array_t *sarray = g_ptr_array_index( setall_array, i );
if( sarray->type != timeout_type ) {
continue;
}
if( sarray->calc[0] == '\0' ){
G_main_setmaxdispatchdelay_id( sarray->id, value );
continue;
}
for( ; ; ){
...
Also, I'm not a big fan of declaring variables other inside blocks,
but I'm not sure what the linux-ha guidelines on this are.
Sometimes I see the same behaviour in heartbeat source code and I will
keep it till other people will point.
> +int
> +light_parse_config(const char * cfgfile )
The top half of light_parse_config() seems to contain a lot of code from
parse_config() Is it possible to restructure or split parse_config() in
order to reduce the volume of code duplication?
OK. I will modify.
> + struct stat sbuf;
> +
> +#define OW_KEEPALIVE 1
> +#define OW_DEADTIME 2
> +#define OW_WARNTIME 4
> +#define OW_DEADPING 8
I'd rather if these defines were just done at the top of the file.
There doesn't seem to be any pressing need to #undef them at any stage.
OK. I will modify.
> + while (fgets(buf, MAXLINE, f) != NULL) {
> + char * bp = buf;
The line above contains trailing whitespace.
It has already been pointed. Sorry.
> + if( new_deadping_ms < new_keepalive_ms * 2 ){
> + ha_log( LOG_ERR, "deadping value is smaller than keepalive time *
2.\n" );
> + ha_log( LOG_ERR, "keepalive value: %ld\n", new_keepalive_ms );
> + ha_log( LOG_ERR, "deadping value: %ld\n", new_deadping_ms );
> + return 4;
> + }
> +
> +/* Maybe following is safe */
Could you explain on what might be unsafe about it?
Yes. we discussed the safety and this is pointed.
Currently keepalive is 1s and deadtime 3s and keepalive will be
changed to 10s, similarly deadtime to 30s. We think it is dangerous
when keepalive is reflected before deadtime. Because standby heartbeat
server keeps old deadtime value ( 3s ) for judging the dead status for
active server and before sending new deadtime value ( 30s ) by
send_local_status(), its interval is 10s that has already been
re-wrote, standby server will detect active server. So, we think that
it must care for order.
And if you pointed for '* 2', it is from init_config(), deadtime and
deadping are bounded for over twice of keepalive. So, I added the
restriction for re-read configuration file function.
> + }
> + }
> +
> +/* For ccm */
Could you indent the comment on the line above?
OK. I will modify.
> + {
> + struct ha_msg * msg;
> + char keepalive[64];
> +
> + snprintf( keepalive, sizeof( keepalive ), "%lx",
config->heartbeat_ms );
> + msg = ha_msg_new( 4 );
> + ha_msg_add( msg, F_TYPE, "new keepalive" );
> + ha_msg_add( msg, F_ORIG, curnode->nodename );
> + ha_msg_add( msg, F_STATUS, "new keepalive" );
> + ha_msg_add( msg, F_KEEPALIVE, keepalive );
> +
> + heartbeat_monitor( msg, KEEPIT, NULL );
> + }
> +#undef GTIMEOUT
> +}
> +
> +void
> +modify_deadtime_value( long new_deadtime_ms, char *new_deadtime_string )
I haven't investigated this, but how is the deadtime set on startup?
Would it make sense to abstract it a bit so that the same code can be
used for startup and modify? Perhaps a modified modify_deadtime_value()
could be used in both cases. Ditto for modify_keepalive_value()
OK, its a nice idea and I will check them.
> +{
> + int i;
> +
> + cl_log( LOG_INFO, "%s: deadtime %ld time is over-written to new value
%ld.\n", __FUNCTION__, config->deadtime_ms, new_deadtime_ms );
> + SetParameterValue( KEY_DEADTIME, new_deadtime_string );
> +
> +/* this part is taken from init_config() */
> + for( i = 0; i < config->nodecount; i++ ) {
> + // config->nodes[i].has_resources = DoManageResources;
If the line above isn't needed, please just remove it.
OK. I will remove.
> +/* for heartbeat.c hb_reregister_with_apphbd() and hb_init_watchdog() */
> + if ( UseApphbd == TRUE ) {
> + hb_init_register_with_apphbd_dummy();
> + }
> +/* To HERE */
> +}
> +
Are the "To HERE" commends just debugging/development stuff? If so,
could you remove them. Also, could you please indent the other comments
in this function.
OK. I will remove and modify.
> @@ -478,6 +523,7 @@
> , {SIGALRM, hb_signal_false_alarm_handler, 1}
> , {SIGUSR1, hb_signal_debug_usr1_handler, 1}
> , {SIGUSR2, hb_signal_debug_usr2_handler, 1}
> + , {SIGRTMAX, hb_signal_light_reread_config_handler, 1}
Is using SIGRTMAX in this way safe? Is SIGHUP already used for something else?
As I wrote, SIGHUP is already used. Because I did not find good other
signal, temporarily adopt SIGRTMAX. If you, Linux-HA developer team,
have a good idea, could you suggest?
> +void
> +maxdispatchdelay_array_add( void *addr, char mode, int value, pid_t pid, int
type )
> +{
> + MaxDispatchDelay_Array_t *carray;
> +
> + carray = malloc( sizeof( MaxDispatchDelay_Array_t ) );
Should this (and others) be cl_malloc()?
OK. I will modify.
> int
> ccm_control_process(ccm_info_t *info, ll_cluster_t * hb)
> {
> @@ -149,6 +152,13 @@
> }
> ha_msg_del(msg);
> msg = newmsg;
> + } else if(strcasecmp(type, "new keepalive") == 0){
> + hb_fd_saved->llc_ops->signoff( hb_fd_saved, FALSE );
> + hb_fd_saved->llc_ops->signon( hb_fd_saved, "ccm" );
> + sleep( 3 );
What is the purpose of sleep(3). It seems strange that it is hard-coded.
I have test many times without sleep(). Sometimes the value is not
reflected. It is temporarily implementation, I know it is poor way, and I am
tracing to the cause.
Best Regards
MATSUDA, Daiki
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/