Hi I am facing an issue with a launched jobs into my mesos agents. I am trying to launch a job through marathon framework and job is staying in stagged state and not running. I could see the log message at the agent console as below:
Scheduling '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' for gc 6.99999884239407days in the future I0828 16:20:36.053483 28512 slave.cpp:1361] *Got assigned task test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 I0828 16:20:36.056224 28510 gc.cpp:83] Unscheduling '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' from gc I0828 16:20:36.056715 28510 gc.cpp:83] Unscheduling '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' from gc I0828 16:20:36.057231 28509 slave.cpp:1480] *Launching task test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 I0828 16:20:36.058661 28509 paths.cpp:528]* Trying to chown* '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d' to user 'root' I0828 16:20:36.067807 28509 slave.cpp:5352]* Launching executor test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c of framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d' I0828 16:20:36.069314 28509 slave.cpp:1698] *Queuing task 'test-crixus.*eb66a42b-6d5c-11e6-bec9-c27afc834a0c' for executor 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 I0828 16:20:36.069902 28509 containerizer.cpp:666] *Starting container* '99620406-87b5-406c-a88b-13adb145c12d' for executor 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of framework 'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' I0828 16:20:36.080713 28509 linux_launcher.cpp:304] *Cloning child process* with flags = I0828 16:20:36.084738 28509 containerizer.cpp:1179] *Checkpointing executor's forked pid 29629* to '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-13adb145c12d/pids/forked.pid' But after that, the job is getting restarted and a new container is created with a new process id. It happening infinitely which is keeping the job in stagged state to mesos-master. This job is nothing but a simle echo "hello world" kind of shell command. Can anyone please point out where its failing or I am doing wrong. Thanks Pankaj
