[
https://issues.apache.org/jira/browse/AURORA-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897131#comment-13897131
]
brian wickman commented on AURORA-201:
--------------------------------------
{noformat}
I0209 08:10:03.100 THREAD28403
org.apache.aurora.scheduler.state.CronJobManager$4.get: Initiating delayed
launch of cron JobKey(role:balexandrescu, environment:devel, name:skyfall)
I0209 08:10:03.101 THREAD28403
com.twitter.common.util.StateMachine$Builder$1.execute:
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
state machine transition INIT -> PENDING
I0209 08:10:03.101 THREAD28403
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command SAVE_STATE for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
I0209 08:10:03.102 THREAD28403
org.apache.aurora.scheduler.async.TaskGroups$2.load: Evaluating group
balexandrescu/devel/skyfall in 1000 ms
I0209 08:10:10.107 THREAD23
com.twitter.common.util.StateMachine$Builder$1.execute:
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
state machine transition PENDING -> ASSIGNED
I0209 08:10:10.107 THREAD23
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command SAVE_STATE for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
) is being assigned task for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de.
I0209 08:10:13.200 THREAD28584
org.apache.aurora.scheduler.MesosSchedulerImpl.statusUpdate: Received status
update for task
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
in state TASK_STARTING with core message Initializing sandbox.
I0209 08:10:13.201 THREAD28584
com.twitter.common.util.StateMachine$Builder$1.execute:
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
state machine transition ASSIGNED -> STARTING
I0209 08:10:13.201 THREAD28584
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command SAVE_STATE for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
I0209 08:11:22.576 THREAD12036
org.apache.aurora.scheduler.thrift.aop.LoggingInterceptor.invoke:
forceTaskState(1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de,
KILLING, SessionKey(user:wickman; elevated:ElevatedPrivilege(requested:true,
justification:user balexandrescu no longer exists)))
I0209 08:11:22.643 THREAD12036
org.apache.aurora.scheduler.thrift.aop.UserCapabilityInterceptor.invoke:
Permitting SessionKey(user:wickman; elevated:ElevatedPrivilege(requested:true,
justification:user balexandrescu no longer exists)) to act as ROOT and perform
action forceTaskState
I0209 08:11:22.701 THREAD12036
com.twitter.common.util.StateMachine$Builder$1.execute:
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
state machine transition STARTING -> KILLING
I0209 08:11:22.701 THREAD12036
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command KILL for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
I0209 08:11:22.701 THREAD12036
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command SAVE_STATE for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
I0209 08:16:22.702 THREAD24
org.apache.aurora.scheduler.async.TaskTimeout$TimedOutTaskHandler.run: Timeout
reached for task
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de:KILLING
I0209 08:16:22.703 THREAD24
com.twitter.common.util.StateMachine$Builder$1.execute:
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
state machine transition KILLING -> LOST
I0209 08:16:22.703 THREAD24
org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
command SAVE_STATE for
1391933403101-balexandrescu-devel-skyfall-0-31e566b6-caff-49db-84f9-cdd85ae5a0de
I0209 08:27:56.453 THREAD29172
org.apache.aurora.scheduler.MesosSchedulerImpl.statusUpdate: Received status
update for task
1391919003030-balexandrescu-devel-skyfall-0-ecc2930a-d279-4811-aca6-5e57d9cdb5d8
in state TASK_LOST with core message
I0209 08:27:56.454 THREAD29172
com.twitter.common.util.StateMachine$Builder$1.execute:
1391919003030-balexandrescu-devel-skyfall-0-ecc2930a-d279-4811-aca6-5e57d9cdb5d8
state machine transition UNKNOWN -> LOST (not allowed)
I0209 08:27:56.553 THREAD29173
org.apache.aurora.scheduler.MesosSchedulerImpl.statusUpdate: Received status
update for task
1391898603031-balexandrescu-devel-skyfall-0-64f3cee5-49c0-40f4-a990-e88cce7f1435
in state TASK_LOST with core message
I0209 08:27:56.554 THREAD29173
com.twitter.common.util.StateMachine$Builder$1.execute:
1391898603031-balexandrescu-devel-skyfall-0-64f3cee5-49c0-40f4-a990-e88cce7f1435
state machine transition UNKNOWN -> LOST (not allowed)
{noformat}
> aurora needs a "really, really kill this task" command
> ------------------------------------------------------
>
> Key: AURORA-201
> URL: https://issues.apache.org/jira/browse/AURORA-201
> Project: Aurora
> Issue Type: Story
> Components: Client, Scheduler
> Reporter: brian wickman
>
> If the executor has a bug that causes it to die but the executor driver stays
> alive, it's possible for it to swallow killTask messages forever. The admin
> client will happily force the task into KILLING state, but upon timing out,
> it will go to LOST and automatically get restarted. This means that there's
> really no way to kill a task if there's a buggy executor. Ideally there is a
> really, really terminal state that says "when it times out in KILLING,
> instead of transitioning to LOST, transition to DEAD." or something along
> those lines.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)