[ 
https://issues.apache.org/jira/browse/MESOS-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638339#comment-13638339
 ] 

Benjamin Mahler commented on MESOS-430:
---------------------------------------

Linked the related issue filed by Matei.
                
> send better messages on executor failures
> -----------------------------------------
>
>                 Key: MESOS-430
>                 URL: https://issues.apache.org/jira/browse/MESOS-430
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Jonathan Boulle
>            Priority: Minor
>
> When an executor fails during launch, the slave marks the task as LOST but 
> doesn't return any useful information. It would be a lot more helpful if the 
> slave could include, in the corresponding status update, an indication that 
> the executor failed (e.g. like the "has terminated" message in the log).
> In this specific case, libprocess is failing to bind (presumably because of 
> port exhaustion), which causes the executor to abort before the driver is 
> initialised (this is in the executor's stderr):
> ---
> F0409 20:53:27.887141 41188 process.cpp:1315] Failed to initialize, bind: 
> Address already in use [98]
> ---
> and in the slave log:
> ---
> I0409 20:53:26.558866 21107 slave.cpp:475] Got assigned task 
> 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 for 
> framework 20110
> 4070004-0000002563-0000
> I0409 20:53:26.560374 21107 paths.hpp:235] Created executor directory 
> '/var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/2011040700
> 04-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1'
> I0409 20:53:26.561908 21089 cgroups_isolation_module.cpp:440] Launching 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  
> (./thermos_executor) in 
> /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805
> 543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1
>  with resources cpus=0.25; mem=128 for framework 
> 201104070004-0000002563-0000 in cgroup 
> mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46
> ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:26.562155 21091 slave.cpp:361] Successfully attached file 
> '/var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/2011040700
> 04-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1'
> I0409 20:53:26.563926 21089 cgroups_isolation_module.cpp:571] Changing cgroup 
> controls for executor thermos-1365540805543-sathya-service-proxy-0-31e97933-
> 3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000 with 
> resources cpus=0.25; mem=128
> I0409 20:53:26.564266 21089 cgroups_isolation_module.cpp:676] Updated 
> 'cpu.shares' to 256 for executor 
> thermos-1365540805543-sathya-service-proxy-0-31e979
> 33-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
> I0409 20:53:26.564595 21089 cgroups_isolation_module.cpp:774] Updated 
> 'memory.limit_in_bytes' to 134217728 for executor 
> thermos-1365540805543-sathya-servi
> ce-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 
> 201104070004-0000002563-0000
> I0409 20:53:26.565055 21089 cgroups_isolation_module.cpp:800] Started 
> listening for OOM events for executor 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  of framework 201104070004-0000002563-0000
> 0409 20:53:26.567622 21089 cgroups_isolation_module.cpp:469] Forked executor 
> at = 41188
> Fetching resources into 
> /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1
> Fetching resource /usr/local/bin/thermos_executor
> Copying resource from /usr/local/bin/thermos_executor to .2013-04-09 
> 20:53:26,696:21086(0x4d6b6940):ZOO_DEBUG@zookeeper_process@1983: Got ping 
> response in 0 ms
> I0409 20:53:28.246511 21104 cgroups_isolation_module.cpp:633] Telling slave 
> of terminated executor 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  of framework 201104070004-0000002563-0000
> I0409 20:53:28.246727 21104 cgroups_isolation_module.cpp:534] Killing 
> executor 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  of framework 201104070004-0000002563-0000
> I0409 20:53:28.246750 21108 slave.cpp:1053] Executor 
> 'thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02'
>  of framework 201104070004-0000002563-0000 has terminated with signal Aborted
> I0409 20:53:28.255456 21104 cgroups_isolation_module.cpp:819] OOM notifier is 
> triggered for executor 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  of framework 201104070004-0000002563-0000 with tag 
> b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.255533 21104 cgroups_isolation_module.cpp:824] Discarded OOM 
> notifier for executor 
> thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02
>  of framework 201104070004-0000002563-0000 with tag 
> b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.256477 21101 cgroups.cpp:1146] Trying to freeze cgroup 
> /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.256595 21101 cgroups.cpp:1185] Successfully froze cgroup 
> /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
>  after 1 attempts
> I0409 20:53:28.261756 21097 cgroups.cpp:1161] Trying to thaw cgroup 
> /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.261864 21097 cgroups.cpp:1268] Successfully thawed 
> /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.285230 21108 slave.cpp:830] Status update: task 
> 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of 
> framework 201104070004-0000002563-0000 is now in state TASK_LOST
> I0409 20:53:28.285598 21104 cgroups_isolation_module.cpp:567] Asked to update 
> resources for an unknown/killed executor
> I0409 20:53:28.285810 21096 gc.cpp:97] Scheduling 
> /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1
>  for removal
> I0409 20:53:28.296897 21094 cgroups_isolation_module.cpp:903] Successfully 
> destroyed the cgroup 
> mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
> I0409 20:53:28.298924 21091 slave.cpp:727] Got acknowledgement of status 
> update for task 
> 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of 
> framework 201104070004-0000002563-0000
> ---

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to