> DAIKI MATSUDA wrote: > >>>>> Hi, All > >>>>> > >>>>> I add the new function for heartbeat-2.0.8 and attached its patch file. > >>>>> > >>>>> The function is to apply the new timeout parameters ( keepalive, > >>>>> deadtime, deadping, warntime ) without stopping the heartbeat services. > >>>>> Currently heartbeat boot scripts supply the 'reload' or 'forcereload' > >>>>> function, but it, they are same, does stop the services and the HA > >>>>> services are moved to standby node, because its process kills the forked > >>>>> heartbeat processes and clients ( crmd etc. ). > >>>>> So, we think to without suspending the services make the changing > >>>>> parameters to apply to driving nodes. Current feature is following. > >>>>> 1. changing ha.cf <http://ha.cf> file for 4 parameters > >>>>> 2. send working parent heartbeat process signal SIGRTMAX ( e.g. kill -s > >>>>> SIGRTMAX `cat /var/run/heartbeat.pid` (Why do I choose SIGRTMAX? I do > >>>>> not find the unused good signal.) > >>>>> > >>>>> As we research the heatbeat, it may be safety. And I want to listen to > >>>>> your issues for patch and functions. > >>>> Sorry to be coming in so late on this, but I was working on the release > >>>> for many weeks now. I really like the idea of dynamically modifying the > >>>> heartbeat configuration - but if you're going to go to the trouble to do > >>>> it, I'd like to see it done more generally. > >>>> > >>>> In other words, I'd like to be able to change nearly any parameter in > >>>> ha.cf at run time without restarting heartbeat. > >>>> > >>>> This would require reworking (and improving) the way heartbeat starts > >>>> up. This would be probably about twice or three times as much work as > >>>> what you've done, but it would be much more useful, and much more > >>>> general. > >>>> > >>>> In the end, if done right, it could be groundwork to letting let us > >>>> eventually be able receive config updates from the CIB. [I know there's > >>>> a bootstrapping issue, but we can deal with that when we get to deciding > >>>> to do that work]. > >>>> > >>>> I have thought about this and have some specific ideas on what kinds of > >>>> things need to be done to make this happen. > >>> Hi, Alan. > >>> > >>> I understood what you say and think it is very good idea to tread all > >>> parameters in ha.cf. I thought my implementation is for testing and it > >>> is better that you, ha-dev team, make its feature. > >> I don't know quite what you meant by "it is better that you, ha-dev > >> team, make it's feature". > > > > I am sorry for poor English. It means that the feature you think to > > make is better than what I made. > > If possible, could you show the schedule > > Not a problem. This will all work out. > > I don't have a particular schedule in mind. I'm also not sure how long > it will take, and this kind of thing depends a lot on how well the > person doing the change knows the code. > > > Here is a suggested approach. At each stage, please test the patch > some, submit the patch for review and then test it extensively, and > submit it for re-review if you found more bugs. I would suggest in this > order - to keep you from spending too much time testing a patch we ask > you to do over. In fact, on the first stage maybe review your data > structures first, because that will determine the code in the end. > > Step 1 - Further categorize and modularize the configuration. > There are at least 4 kinds of statements in the configuration > and there may be more: > 1. media statements - like ucast, bcast, etc. Things > which load plugins and start read/write processes > 2. global statements - which affect some or all of the > media statements - things like port number, serial > baud rate, etc. Knowing which global statements > affect which media statements, may eventually be > important. > 3. Respawn statements - things which start child processes > this includes the implied respawn statements in things > like 'crm on'. > 4. Other statements. For each of these, figure out which > class of processes are affected by each change. > > Make it so that each media statement is processed by a single > function call. Right now, the processing for any given media > statement is embedded in a loop. This is just restructuring. > > If you store all the ha.cf statements in an array, then you can > make a minor improvement even in this stage. Make a pass > through the array looking for global statements and execute them > first. This will fix some known annoying behaviors where these > need to occur before they're used. > > For media and respawn statements, you need to add an association > between the statements and the child processes they created. > That way, when we finally get around to processing changes, we > can kill them when they go away or change. We already have > a special way to track processes. Use that code, but create > new associations. > > Note that this doesn't implement the feature we are talking > about, it just lays the groundwork for it. At this point > the code won't be able to do anything new. That happens > in step 2. Test this code in CTS, and test it manually. > Have it reviewed, and repeat until people are happy. > Then I'll commit it for you. > > Step 2 - add the code to deal with changes in the configuration, and > figure out when to kill things, when to start new ones. > > Step 3 - Create CTS tests which change the configuration, then change it > back, watching for the correct behavior in each case. Run 1000 > instances of this test alone in a CTS run. After you have had the code > reviewed, and have run these tests, and everyone is happy, then we'll > commit this stage of the changes. > > Suggested Enhancement - after doing this: > Since you now know how to restart anything in heartbeat, you should also > be able to restart a pair of read and write children if either should > die. So, we should be able to then recover from them dying. Add the > code to do this, and fix up the CTS test which is supposed to kill > random processes, to know how to kill any process in the system. Turn > the test back on, and run 1000 instances of this test in CTS. Similarly > for this stage, submit it for review, and when everyone is happy, we'll > commit it. > > And, in the end this will be a great improvement, and the system will > also be more robust (better able to recover from errors) than it has > ever been. > > How does that sound for an outline of a plan? > > > -- > Alan Robertson <[EMAIL PROTECTED]>
Hi, Alan-san. I am sorry for delay. And we asked our sponsor and he admit to research what you suggest. Though I researched the parameters for ha.cf, they are over 50 and I think that almost parameters are not needed to be modified dynamically, e.g. crm, use_logd, baud, etc. So, your issue is ideally, but to realize it takes many costs and it is not pratical. Regards MATSUDA, Daiki _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
