How do I capture the stderr of the slave? I set the work_dir and log_dir in the configuration.
I know that /bin/sh exists on the hosts that I'm trying to run the command on. On Thu, May 16, 2013 at 2:25 PM, Vinod Kone <[email protected]> wrote: > From the log, it looks like the command executor was Aborted! > > I0515 20:38:49.283588 60936 slave.cpp:1065] Executor 'Task > ct:foo:1368650328332:1 (touch /u/dgr...)' of framework chronos has > terminated with signal Aborted > > Not sure why though. Command executor is just a shell wrapper. Also, looks > like you are not capturing the stderr of slave? This might also provide > some useful info, regarding whether the executor died during fork or after > exec (though the lack of stderr/stdout in executor sandbox suggests the > former). > > > > On Thu, May 16, 2013 at 10:20 AM, David Greenberg <[email protected] > >wrote: > > > When I try to run a job with chronos, if I provide no executor, the task > > fails with the "failed" state. When I check the executor logs > > (stdout/stderr) in Mesos, there's nothing there. I'm just trying simple > > things, like "echo hello world". I've included a short log of the > > TASK_FAILED message below. Do you have any ideas on where I could look > for > > debugging? I'm using mesos 0.12. > > > > I0515 20:38:49.093731 60967 slave.cpp:487] Got assigned task > > ct:foo:1368650328332:1 for framework chronos > > I0515 20:38:49.095643 60967 paths.hpp:235] Created executor directory > > > > > '/data/scratch/local/mesos/work/slaves/201305152037-3261142444-5050-57326-0/frameworks/chronos/executors/Task > > ct:foo:1368650328332:1 (touch > > /u/dgr...)/runs/45f66450-3e3f-4a41-af40-4cb33aa33960' > > I0515 20:38:49.095824 60948 process_based_isolation_module.cpp:108] > > Launching Task ct:foo:1368650328332:1 (touch /u/dgr...) (/net/ > > > > > hsdgrnbrg.aoa.twosigma.com/userhome/dgrnbrg/black-mesos/chronos-chosen-mesos/install/libexec/mesos/mesos-executor > > ) > > in > > > > > /data/scratch/local/mesos/work/slaves/201305152037-3261142444-5050-57326-0/frameworks/chronos/executors/Task > > ct:foo:1368650328332:1 (touch > > /u/dgr...)/runs/45f66450-3e3f-4a41-af40-4cb33aa33960 with resources ' for > > framework chronos > > I0515 20:38:49.096753 60948 process_based_isolation_module.cpp:153] > Forked > > executor at 61215 > > I0515 20:38:49.097357 60956 slave.cpp:361] Successfully attached file > > > > > '/data/scratch/local/mesos/work/slaves/201305152037-3261142444-5050-57326-0/frameworks/chronos/executors/Task > > ct:foo:1368650328332:1 (touch > > /u/dgr...)/runs/45f66450-3e3f-4a41-af40-4cb33aa33960' > > I0515 20:38:49.283064 60963 process_based_isolation_module.cpp:344] > Telling > > slave of lost executor Task ct:foo:1368650328332:1 (touch /u/dgr...) of > > framework chronos > > I0515 20:38:49.283588 60936 slave.cpp:1065] Executor 'Task > > ct:foo:1368650328332:1 (touch /u/dgr...)' of framework chronos has > > terminated with signal Aborted > > I0515 20:38:49.284939 60936 slave.cpp:842] Status update: task > > ct:foo:1368650328332:1 of framework chronos is now in state TASK_FAILED > > I0515 20:38:49.285099 60936 gc.cpp:97] Scheduling > > > > > /data/scratch/local/mesos/work/slaves/201305152037-3261142444-5050-57326-0/frameworks/chronos/executors/Task > > ct:foo:1368650328332:1 (touch > > /u/dgr...)/runs/45f66450-3e3f-4a41-af40-4cb33aa33960 for removal > > I0515 20:38:49.283115 60963 process_utils.hpp:64] Stopping ... 61215 > > Sent signal to 61215 > > I0515 20:38:49.432054 60936 slave.cpp:739] Got acknowledgement of status > > update for task ct:foo:1368650328332:1 of framework chronos > > >
