[
https://issues.apache.org/jira/browse/MESOS-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204669#comment-14204669
]
Tom Arnfeld commented on MESOS-1812:
------------------------------------
So I managed to get around this issue but finding a way for the Executor to
commit suicide itself. That being said, I still think a new
{{shutdownExecutor}} API is worth discussing in another thread.
I agree this issue here is a bug that should be fixed. I'd rather mesos was
responsible for maintaining these timing and ordering constraints, instead of
frameworks having to build around them, given that mesos is the friendly
cluster manager for distributed systems, designed to reduce the code in
frameworks. :-)
> Queued tasks are not actually launched in the order they were queued
> --------------------------------------------------------------------
>
> Key: MESOS-1812
> URL: https://issues.apache.org/jira/browse/MESOS-1812
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Reporter: Tom Arnfeld
>
> Even though tasks are assigned and queued in the order in which they are
> launched (e.g multiple tasks in reply to one offer), due to timing issues
> with the futures, this can sometimes break the causality and end up not being
> launched in order.
> Example trace from a slave... In this example the Task_Tracker_10 task should
> be launched before slots_Task_Tracker_10.
> {code}
> I0918 02:10:50.371445 17072 slave.cpp:933] Got assigned task Task_Tracker_10
> for framework 20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.372110 17072 slave.cpp:933] Got assigned task
> slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.372172 17073 gc.cpp:84] Unscheduling
> '/mnt/mesos-slave/slaves/20140915-112519-3171422218-5050-5016-6/frameworks/20140916-233111-3171422218-5050-14295-0015'
> from gc
> I0918 02:10:50.375018 17072 slave.cpp:1043] Launching task
> slots_Task_Tracker_10 for framework 20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.386282 17072 slave.cpp:1153] Queuing task
> 'slots_Task_Tracker_10' for executor executor_Task_Tracker_10 of framework
> '20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.386312 17070 mesos_containerizer.cpp:537] Starting container
> '5f507f09-b48e-44ea-b74e-740b0e8bba4d' for executor
> 'executor_Task_Tracker_10' of framework
> '20140916-233111-3171422218-5050-14295-0015'
> I0918 02:10:50.388942 17072 slave.cpp:1043] Launching task Task_Tracker_10
> for framework 20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.406277 17070 launcher.cpp:117] Forked child with pid '817' for
> container '5f507f09-b48e-44ea-b74e-740b0e8bba4d'
> I0918 02:10:50.406563 17072 slave.cpp:1153] Queuing task 'Task_Tracker_10'
> for executor executor_Task_Tracker_10 of framework
> '20140916-233111-3171422218-5050-14295-0015
> I0918 02:10:50.408499 17069 mesos_containerizer.cpp:647] Fetching URIs for
> container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' using command
> '/usr/local/libexec/mesos/mesos-fetcher'
> I0918 02:11:11.650687 17071 slave.cpp:2873] Current usage 17.34%. Max allowed
> age: 5.086371210668750days
> I0918 02:11:16.590270 17075 slave.cpp:2355] Monitoring executor
> 'executor_Task_Tracker_10' of framework
> '20140916-233111-3171422218-5050-14295-0015' in container
> '5f507f09-b48e-44ea-b74e-740b0e8bba4d'
> I0918 02:11:17.701015 17070 slave.cpp:1664] Got registration for executor
> 'executor_Task_Tracker_10' of framework
> 20140916-233111-3171422218-5050-14295-0015
> I0918 02:11:17.701897 17070 slave.cpp:1783] Flushing queued task
> slots_Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework
> 20140916-233111-3171422218-5050-14295-0015
> I0918 02:11:17.702350 17070 slave.cpp:1783] Flushing queued task
> Task_Tracker_10 for executor 'executor_Task_Tracker_10' of framework
> 20140916-233111-3171422218-5050-14295-0015
> I0918 02:11:18.588388 17070 mesos_containerizer.cpp:1112] Executor for
> container '5f507f09-b48e-44ea-b74e-740b0e8bba4d' has exited
> I0918 02:11:18.588665 17070 mesos_containerizer.cpp:996] Destroying container
> '5f507f09-b48e-44ea-b74e-740b0e8bba4d'
> I0918 02:11:18.599234 17072 slave.cpp:2413] Executor
> 'executor_Task_Tracker_10' of framework
> 20140916-233111-3171422218-5050-14295-0015 has exited with status 1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)