Jan-Philip Gehrcke created MESOS-6951:
-----------------------------------------

             Summary: Docker containerizer: mangled environment when env value 
contains LF byte
                 Key: MESOS-6951
                 URL: https://issues.apache.org/jira/browse/MESOS-6951
             Project: Mesos
          Issue Type: Bug
          Components: containerization
            Reporter: Jan-Philip Gehrcke


Consider this Marathon app definition

{code}
{
  "id": "/testapp",
  "cmd": "env && tail -f /dev/null",
  "env":{
    "TESTVAR":"line1\nline2"
  },
  "cpus": 0.1,
  "mem": 10,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "alpine"
    }
  }
}
{code}

The JSON-encoded newline in the value of the {{TESTVAR}} environment variable 
leads to a corrupted task environment. What follows is a subset of the 
resulting task environment (as printed via {{env}}, i.e. in key=value notation):

{code}
line2=
TESTVAR=line1
{code}

That is, the trailing part of the intended value ended up being interpreted as 
variable name, and only the leading part of the intended value was used as 
actual value for {{TESTVAR}}.

Common application scenarios that would badly break with that involve 
pretty-printed JSON documents or YAML documents passed along via the 
environment.

Following the code and information flow led to the conclusion that Docker's 
{{--env-file}} command line interface is the weak point in the flow. It is 
currently used in Mesos' Docker containerizer for passing the environment to 
the container:

{code}
  argv.push_back("--env-file");
  argv.push_back(environmentFile);
{code}

(Ref: 
[code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])


Docker's {{--env-file}} argument behavior is documented via

{quote}
The --env-file flag takes a filename as an argument
and expects each line to be in the VAR=VAL format,
{quote}
(Ref: https://docs.docker.com/engine/reference/commandline/run/)

That is, Docker identifies individual environment variable key/value pair 
definitions based on newline bytes in that file which explains the observed 
environment variable value fragmentation. Notably, Docker does not provide a 
mechanism for escaping newline bytes in the values specified in this 
environment file.

I think it is important to understand that Docker's {{--env-file}} mechanism is 
ill-posed in the sense that it is not capable of transmitting the whole range 
of environment variable values allowed by POSIX. That's what the Single UNIX 
Specification, Version 3 has to say about environment variable values:

{quote}
the value shall be composed of characters from the
portable character set (except NUL and as indicated below). 
{quote}
(Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)

About "The portable character set": 
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3

It includes (among others) the LF byte. Understandably, the current Docker 
{{--env-file}} behavior will not change, so this is not an issue that can be 
deferred to Docker: https://github.com/docker/docker/issues/12997

Notably, the {{--env-file}} method for communicating environment variables to 
Docker containers was just recently introduced to Mesos as of 
https://issues.apache.org/jira/browse/MESOS-6566, for not leaking secrets 
through the process listing. Previously, we specified env key/value pairs on 
the command line which leaked secrets to the process list and probably also did 
not support the full range of valid environment variable values.

We need a solution that
1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
2) allows for passing arbitrary environment variable values.

It seems that Docker's {{--env}} method can be used for that. It can be used to 
define _just the names of the environment variables_ to-be-passed-along, in 
which case the docker binary will read the corresponding values from its own 
environment, which we can clearly prepare appropriately when we invoke the 
corresponding child process. This method would still leak environment variable 
_names_ to the process listing, but (especially if documented) this should be 
fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to