Dan Adkins created MESOS-9335: --------------------------------- Summary: LIBPROCESS_ADVERTIES_IP is not passed to mesos-docker-executor Key: MESOS-9335 URL: https://issues.apache.org/jira/browse/MESOS-9335 Project: Mesos Issue Type: Bug Components: executor Affects Versions: 1.7.0 Environment: Linux ip-10-33-15-130 4.4.0-1069-aws #79-Ubuntu SMP Mon Sep 24 15:01:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Mesos 1.7.0 Reporter: Dan Adkins I noticed that when I set both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP for my mesos-slave, only LIBPROCESS_IP gets propagated to mesos-docker-executor. I noticed this because I have to set them both to avoid a hostname lookup, which doesn't work in my environment. LIBPROCESS_IP is set to 0.0.0.0, so that the slave will bind to any IP adrdess (and still be reachable locally at port 5051 for metrics gathering), while LIBPROCESS_ADVERTISE_IP is set to my externally reachable IP address so the rest of the cluster can talk to it. Lo and behold, with this setup, my slave executor processes were failing with the dreaded hostname lookup. I notice there is code to inject LIBPROCESS_IP into the executor environment, but not mention of LIBPROCESS_ADVERTISE_IP. [https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L9974-L9983] Here's the command line and environment for my slave: LIBPROCESS_IP=0.0.0.0 MASTER=zk://10.33.13.250:2181,10.33.9.108:2181,10.33.7.6:2181/mesos LC_ALL=en_US.UTF-8 LOGS=/var/log/mesos LIBPROCESS_ADVERTISE_IP=10.33.15.130 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 SHLVL=0 ULIMIT=-n 8192 /usr/sbin/mesos-slave --master=zk://10.33.13.250:2181,10.33.9.108:2181,10.33.7.6:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --work_dir=/mesos And here's the command-line and environment for the executor process it attempted to run: LIBPROCESS_IP=0.0.0.0 LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.33.15.130:5051 MESOS_CHECKPOINT=0 MESOS_DIRECTORY=/mesos/slaves/7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864/frameworks/dummy_sleep-func-dadkins-d84e56b1a9/executors/dummy_sleep-func-dadkins-d84e56b1a9-func_0/runs/6b5adff6-c745-49ce-93c3-682bf7a23aca MESOS_EXECUTOR_ID=dummy_sleep-func-dadkins-d84e56b1a9-func_0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs MESOS_FRAMEWORK_ID=dummy_sleep-func-dadkins-d84e56b1a9 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.7.0.so MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.7.0.so MESOS_SLAVE_ID=7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864 MESOS_SLAVE_PID=slave(1)@10.33.15.130:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin mesos-docker-executor --cgroups_enable_cfs=false --container=mesos-6b5adff6-c745-49ce-93c3-682bf7a23aca–docker=docker --docker_socket=/var/run/docker.sock --help=false --initialize_driver_logging=true- -launcher_dir=/usr/libexec/mesos --logbufsecs=0 --logging_level=INFO --mapped_directory=/mnt/mesos/sandbox --quiet=false --sandbox_directory=/mesos/slaves/7c587a36-c4ed-48ce-bfa2-2b0d6e8274b2-S3864/frameworks/dummy_sleep-func-dadkins-d84e56b1a9/executors/dummy_sleep-func-dadkins-d84e56b1a9 -- This message was sent by Atlassian JIRA (v7.6.3#76005)