[jira] [Commented] (MESOS-9868) NetworkInfo from the agent /state endpoint is not correct.

Qian Zhang (JIRA) Thu, 25 Jul 2019 00:51:23 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892507#comment-16892507
 ]


Qian Zhang commented on MESOS-9868:
-----------------------------------

I found a way to reproduce this issue:

1. Use `mesos-execute` to launch a task group to join a CNI network `net1` and 
with checkpoint enabled.
{code:java}
$ mesos-execute --master=<masterIP>:5050 
--task_group=file:///tmp/task_group.json --networks=net1 --checkpoint
$ cat /tmp/task_group.json
{
  "tasks":[
    {
      "name" : "test",
      "task_id" : {"value" : "test"},
      "agent_id": {"value" : ""},
      "resources": [
        {"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}},
        {"name": "mem", "type": "SCALAR", "scalar": {"value": 32}}
      ],
      "command": {
        "value": "ip a && sleep 3600"
      },
      "health_check": {
        "type": "COMMAND",
        "command": {
          "value": "if test -f file; then rm -rf file && exit 1; else touch 
file && exit 0; fi"
        }
      }
    }
  ]
}
{code}
Please note that health check is enabled for the above task, and it will 
succeed for the first time, fail for the second time, and succeed for the third 
time, ... The reason that we do health check like this is we want to keep 
generating status update for this task (I will explain why we need this later).

Now when you access the agent's /state endpoint, you will find the IP address 
for the task is correct which is an IP in the CNI network `net1`.

2. Restart Mesos agent, and then access the agent's /state endpoint again, you 
will find the IP address for the task has been changed to the IP of the agent 
host.

> NetworkInfo from the agent /state endpoint is not correct.
> ----------------------------------------------------------
>
>                 Key: MESOS-9868
>                 URL: https://issues.apache.org/jira/browse/MESOS-9868
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Gilbert Song
>            Assignee: Qian Zhang
>            Priority: Blocker
>              Labels: containerization
>
> NetworkInfo from the agent /state endpoint is not correct, which is also 
> different from the networkInfo of /containers endpoint. Some frameworks rely 
> on the state endpoint to get the ip address for other containers to run.
> agent's state endpoint
> {noformat}
> {
> "state": "TASK_RUNNING",
> "timestamp": 1561574343.1521769,
> "container_status": {
> "container_id": {
> "value": "9a2633be-d2e5-4636-9ad4-7b2fc669da99",
> "parent": {
> "value": "45ebab16-9b4b-416e-a7f2-4833fd4ed8ff"
> }
> },
> "network_infos": [
> {
> "ip_addresses": [
> {
> "protocol": "IPv4",
> "ip_address": "172.31.10.35"
> }
> ]
> }
> ]
> },
> "healthy": true
> }
> {noformat}
> agent's /containers endpoint
> {noformat}
> "status": {
> "container_id": {
> "value": "45ebab16-9b4b-416e-a7f2-4833fd4ed8ff"
> },
> "executor_pid": 1723,
> "network_infos": [
> {
> "ip_addresses": [
> {
> "ip_address": "9.0.73.65",
> "protocol": "IPv4"
> }
> ],
> "name": "dcos"
> }
> ]
> }
> {noformat}
> The ip addresses are different^^.
> The container is in RUNNING state and is running correctly. Just the state 
> endpoint is not correct. One thing to notice is that the state endpoint used 
> to show the correct IP. After there was an agent restart and master leader 
> re-election, the IP address in the state endpoint was changed.
> Here is the checkpoint CNI network information
> {noformat}
> OK-23:37:48-root@int-mountvolumeagent2-soak113s:/var/lib/mesos/slave/meta/slaves/60c42ab7-eb1a-4cec-b03d-ea06bff00c3f-S4/frameworks/26ffb84c-81ba-4b3b-989b-9c6560e51fa1-0171/executors/k8s-clusters.kc02__etcd__b50dc403-30d1-4b54-a367-332fb3621030/runs/latest/tasks/k8s-clusters.kc02__etcd-2-peer__5b6aa5fc-e113-4021-9db8-b63e0c8d1f6c
>  # cat 
> /var/run/mesos/isolators/network/cni/45ebab16-9b4b-416e-a7f2-4833fd4ed8ff/dcos/network.conf
>  
> {"args":{"org.apache.mesos":{"network_info":{"name":"dcos"}}},"chain":"M-DCOS","delegate":{"bridge":"m-dcos","hairpinMode":true,"ipMasq":false,"ipam":{"dataDir":"/var/run/dcos/cni/networks","routes":[{"dst":"0.0.0.0/0"}],"subnet":"9.0.73.0/25","type":"host-local"},"isGateway":true,"mtu":1420,"type":"bridge"},"excludeDevices":["m-dcos"],"name":"dcos","type":"mesos-cni-port-mapper"}
> {noformat}
> {noformat}
> OK-01:30:05-root@int-mountvolumeagent2-soak113s:/var/lib/mesos/slave/meta/slaves/60c42ab7-eb1a-4cec-b03d-ea06bff00c3f-S4/frameworks/26ffb84c-81ba-4b3b-989b-9c6560e51fa1-0171/executors/k8s-clusters.kc02__etcd__b50dc403-30d1-4b54-a367-332fb3621030/runs/latest/tasks/k8s-clusters.kc02__etcd-2-peer__5b6aa5fc-e113-4021-9db8-b63e0c8d1f6c
>  # cat 
> /var/run/mesos/isolators/network/cni/45ebab16-9b4b-416e-a7f2-4833fd4ed8ff/dcos/eth0/network.info
> {"dns":{},"ip4":{"gateway":"9.0.73.1","ip":"9.0.73.65/25","routes":[{"dst":"0.0.0.0/0","gw":"9.0.73.1"}]}}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (MESOS-9868) NetworkInfo from the agent /state endpoint is not correct.

Reply via email to