Hi,

We have framework that launched Spark jobs on our Mesos cluster. We are 
currently having an issue where Spark jobs are getting stuck due to some 
timeout issue. We have cancel functionality that would kill send task_kill 
message to master. When the jobs get stuck Spark driver task is not getting 
killed even though the agent on the node that driver is running get the kill 
request. Is there any timeout that I can set so that Mesos agent can force kill 
the task in this scenario? Really appreciate your help.

Thanks,
Venkat


Log entry from agent logs:

I0404 03:44:47.367276 55066 slave.cpp:2035] Asked to kill task 79668.0.0 of 
framework 35e600c2-6f43-402c-856f-9084c0040187-002

Reply via email to