> On 14 Mar 2015, at 14:39, Krishna Kishore Bonagiri <[email protected]> > wrote: > > This is what I see in the AM's log since the STOP command is issued. Even > though it indicates that STOP command SUCCEEDED, I see that the stop > function in my python script is not getting executed. Does the exception at > the end of this log indicate something?
OK. AM is stopping here, so that bit is working: CLI -> AM -> AM shuts down. The .py stop script is not being invoked ... which is what Sumit stated: that bit isn't wired up. Why not? we've not got round to it yet. To run robustly in a cluster with unreliable hosts, your components need to be designed to be killed without warning. This is particularly the case in a queue with pre-emption, as the container is destroyed without even telling the AM until afterwards. We're currently assuming that the components do handle unannounced container destruction, — so the agent isn't forwarding the command
