Hi, Why this script is still not committed from the first post in February to your development tree at
http://hg.linux-ha.org/agents/file/e13565f0ea8a/heartbeat Or did I check at the wrong place? Achim Achim Stumpf schrieb: > Hi, > > Lars Ellenberg wrote: >> On Fri, Feb 06, 2009 at 03:18:55PM +0100, Achim Stumpf wrote: >>> Hi, >>> >>> I have written a ocf sshd RA script. It is based on the proftpd >>> script. Feel free to use it and commit it please. >>> >>> I have written this script with the special option >>> "OCF_RESKEY_killallchilds": >> >>> We have some ugly written cron like jobs here, which access our >>> cluster via ssh. Most of them run in loops and open again and again >>> ssh sessions to the cluster and through that on the drbd device. Or >>> they start through ssh a loop on the cluster and the childs access the >>> drbd device. >>> >>> With the function get_and_stop_pids I am able to get all childs of a >>> process. But if the option is set to 0, sshd will terminate then >>> without the above story. >>> >>> The workaround with fuser in RA Filesystem does not solve this issue, >>> because the parent process starts new childs which will access the >>> drbd device again for example. >> >> >> the workaround solves it fine. >> if you make your "applications" "cluster aware" in the following sense: >> >> iiuc, what you do now is basically >> ssh cluster "while true; do some_job_which_uses_the_drbd ; done" >> >> >> change that to >> ssh cluster "cd /your/drbd/mount/point ; >> while true; do ( some_job_which_uses_the_drbd ) ; done" >> > > I am working for a company in the financial industry, and theses jobs > are accessing the clusters via ssh and they access often in loops, as > you and me have shown above. > >> >> as the process (shell) the loop spawning new processes runs in >> now has its cwd on DRBD, the "fuser -k" will find and kill it. >> >> I think that would be much easier than modifying the ssh RA. >> > > If you have only a couple of scripts, which you could modify yourself, > yes. But I am talking here of hundreds of jobs of people of my company > and other companies, and if I tell them to change there jobs, this would > never come to an end. > I think it is not such a good idea to rely on code which is written to > access the drbd device through sshd. Someone makes a mistake and a > failover would fail. > > Cheers, > > Achim > > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/