[jira] [Commented] (MESOS-5294) Status updates after a health check are incomplete or invalid

Travis Hegner (JIRA) Thu, 28 Apr 2016 06:15:24 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262112#comment-15262112
 ]


Travis Hegner commented on MESOS-5294:
--------------------------------------

[~gilbert] It's hard to know exactly when this started, as I was also affected 
by MESOS-4370. I was running my own patch to fix that issue on 0.27, and I know 
that health checks were causing the IP addresses to not be resolved with that 
version and docker 1.10, and perhaps 1.9 as well.

[~kaysoky] I understand that it's not dependent directly. Something with 
command based health checks that run inside the docker container is causing the 
`/state` endpoint to lose the IP address only after the task is marked healthy. 
The IP address exists during the startup process, and is removed after the 
"healthy" status update. I'll try and get the requested info posted soon.

My original assumption in the description of this issue is slightly off too. As 
I looked closer, I realized that there is a local variable in the 
`taskHealthUpdated()` function called `taskID` of type `TaskID`, and the other 
references to similar lines of code were using the class scoped `taskId` 
variable of type `Option<TaskID>`. Hence, the `.get()` method on those.

Obviously my original fix wouldn't compile, so I'm kind of back to the 
beginning of my troubleshooting.

> Status updates after a health check are incomplete or invalid
> -------------------------------------------------------------
>
>                 Key: MESOS-5294
>                 URL: https://issues.apache.org/jira/browse/MESOS-5294
>             Project: Mesos
>          Issue Type: Bug
>         Environment: mesos 0.28.0, docker 1.11, marathon 0.15.3, mesos-dns, 
> ubuntu 14.04
>            Reporter: Travis Hegner
>            Assignee: Travis Hegner
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> With command health checks enabled via marathon, mesos-dns will resolve the 
> task correctly until the task is reported as "healthy". At that point, 
> mesos-dns stops resolving the task correctly.
> Digging through src/docker/executor.cpp, I found that in the 
> "taskHealthUpdated()" function is attempting to copy the taskID to the new 
> status instance with "status.mutable_task_id()->CopyFrom(taskID);", but other 
> instances of status updates have a similar line 
> "status.mutable_task_id()->CopyFrom(taskID.get());".
> My assumption is that this difference is causing the status update after a 
> health check to not have a proper taskID, which in turn is causing an 
> incorrect state.json output.
> I'll try to get a patch together soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5294) Status updates after a health check are incomplete or invalid

Reply via email to