Hi Ashley
Any insight as to why this is happening? Is this a deliberate implementation decision? Is there a better way around this?
I haven't had a chance to look at your scenario in detail, but this sounds very similar to a common problem related to a nohup/ssh race condition.
Basically, what can happen in many remote automation scenarios is as follows:
* You open an SSH connection to a box which calls something like a "service start" script * The "service start" script returns as soon as it has run its final command, which is something like "nohup /my/service/run.sh &"
Now there is a race condition between nohup doing its job and SSH closing the connection. More specifically, there is a short period of time before nohup has been able to disconnect the service process from sshd. If the sshd process terminates before the service process has been disconnected, it will kill the service, as that is (still) a child process of sshd.
The solution here is to ensure that nohup has had a chance to kick in before the sshd process terminates. Probably the best way to do this is to ensure your service start script does not return until the service is actually up.
Another common way is to simply change the command you're running via SSH from "service start" to "service start && sleep 2", although all that's really doing is giving nohup two more seconds to do its job.
Without knowing more about what your service script is doing, I can't say whether this is actually what you're seeing, but the symptoms certainly sound comparable.
Regards ap