Hi,

Why this script is still not committed from the first post in February to your 
development tree at

http://hg.linux-ha.org/agents/file/e13565f0ea8a/heartbeat

Or did I check at the wrong place?



Achim



Achim Stumpf schrieb:
> Hi,
> 
> Lars Ellenberg wrote:
>> On Fri, Feb 06, 2009 at 03:18:55PM +0100, Achim Stumpf wrote:
>>> Hi,
>>>
>>> I have written a ocf sshd RA script. It is based on the proftpd
>>> script. Feel free to use it and commit it please.
>>>
>>> I have written this script with the special option
>>> "OCF_RESKEY_killallchilds":
>>
>>> We have some ugly written cron like jobs here, which access our
>>> cluster via ssh. Most of them run in loops and open again and again
>>> ssh sessions to the cluster and through that on the drbd device. Or
>>> they start through ssh a loop on the cluster and the childs access the
>>> drbd device.
>>>
>>> With the function get_and_stop_pids I am able to get all childs of a
>>> process. But if the option is set to 0, sshd will terminate then
>>> without the above story.
>>>
>>> The workaround with fuser in RA Filesystem does not solve this issue,
>>> because the parent process starts new childs which will access the
>>> drbd device again for example.
>>
>>
>> the workaround solves it fine.
>> if you make your "applications" "cluster aware" in the following sense:
>>
>>  iiuc, what you do now is basically
>>    ssh cluster "while true; do some_job_which_uses_the_drbd ; done"
>>
>>
>>  change that to
>>    ssh cluster "cd /your/drbd/mount/point ;
>>     while true; do ( some_job_which_uses_the_drbd ) ; done"
>>
> 
> I am working for a company in the financial industry, and theses jobs 
> are accessing the clusters via ssh and they access often in loops, as 
> you and me have shown above.
> 
>>
>> as the process (shell) the loop spawning new processes runs in
>> now has its cwd on DRBD, the "fuser -k" will find and kill it.
>>
>> I think that would be much easier than modifying the ssh RA.
>>
> 
> If you have only a couple of scripts, which you could modify yourself, 
> yes. But I am talking here of hundreds of jobs of people of my company 
> and other companies, and if I tell them to change there jobs, this would 
> never come to an end.
> I think it is not such a good idea to rely on code which is written to 
> access the drbd device through sshd. Someone makes a mistake and a 
> failover would fail.
> 
> Cheers,
> 
> Achim
> 
> 
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to