[ 
https://issues.apache.org/jira/browse/AURORA-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187895#comment-14187895
 ] 

Jay Buffington commented on AURORA-898:
---------------------------------------

Yeah, I think this is related to the issue I describe in a comment in 
AURORA-633 where I said:

    Since the mesos-slave will be the process that does the docker pull before
    starting the container the job instance will be in state ASSIGNED while the
    bits are being downloaded. Downloading happens between the time the 
scheduler
    sends taskLaunch and when the mesos master sends the first statusUpdate. If
    downloading takes longer than --transient_task_state_timeout command line 
flag
    the scheduler will assume TASK_LOST, kill it and reschedule. Updates will 
also
    be affected, in that the task won't be in RUNNING until the download is
    complete, so users should expect a much longer restart_threshold in their
    update config



> unable to kill a job that is in ASSIGNED state
> ----------------------------------------------
>
>                 Key: AURORA-898
>                 URL: https://issues.apache.org/jira/browse/AURORA-898
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.5.0
>            Reporter: Bhuvan Arumugam
>
> we unable to kill a job that's in ASSIGNED state. it's always reproducible, 
> even with a hello world job.
> The {{aurora killall}} command give up after 5mins with this message:
> {code}
> .
> .
> DEBUG "POST /api HTTP/1.1" 200 None
> DEBUG] "POST /api HTTP/1.1" 200 None
> DEBUG] handle_response(): returning <Response [200]>
> DEBUG] Response from scheduler: OK (message: None)
> FATAL] Tasks were not killed in time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to