[ 
https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729409#comment-15729409
 ] 

Till Toenshoff commented on MESOS-6747:
---------------------------------------

Here is some information on how I got to the root of this problem.

The agent is setup for using SSL via the {{LIBPROCESS_SSL...}} variables.

The output within the agent log whenever a task is about to get run:
{noformat}
[...]
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.491056  2455 
container_state_cache_impl.cpp:134] Writing container 
file[/var/run/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/b8f97301-c477-49bc-87ed-1e7ea49366bf]
 with endpoint[198.51.100.1:37476]
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.491147  2455 sync_util.hpp:136] Result for 
ticket 1297 complete, returning value.
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.491202  2421 sync_util.hpp:83] Dispatch 
result obtained for ticket 1297 after waiting <=5s: register_and_update_cache
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.534430  2424 systemd.cpp:96] Assigned child 
process '16285' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.541784  2424 systemd.cpp:96] Assigned child 
process '16286' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.551643  2425 linux_launcher.cpp:429] 
Launching container b8f97301-c477-49bc-87ed-1e7ea49366bf and cloning with 
namespaces CLONE_NEWNS
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.555986  2425 systemd.cpp:96] Assigned child 
process '16288' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.559031  2420 containerizer.cpp:1577] 
Checkpointing container's forked pid 16288 to 
'/var/lib/mesos/slave/meta/slaves/b79c72c2-5566-4a2f-86da-668611ee4e78-S0/frameworks/b79c72c2-5566-4a2f-86da-668611ee4e78-0001/executors/aoaoaoaoaoao.c3a39772-bca6-11e6-9988-70b3d5800001/runs/b8f97301-c477-49bc-87ed-1e7ea49366bf/pids/forked.pid'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to 
STDERR
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597594 16286 openssl.cpp:424] CA directory 
path unspecified! NOTE: Set CA directory path with 
LIBPROCESS_SSL_CA_DIR=<dirpath>
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597721 16286 openssl.cpp:429] Will not verify 
peer certificate!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer 
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597728 16286 openssl.cpp:435] Will only 
verify peer certificate if presented!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer 
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to 
STDERR
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597862 16285 openssl.cpp:424] CA directory 
path unspecified! NOTE: Set CA directory path with 
LIBPROCESS_SSL_CA_DIR=<dirpath>
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597965 16285 openssl.cpp:429] Will not verify 
peer certificate!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer 
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: I1207 18:58:31.597975 16285 openssl.cpp:435] Will only 
verify peer certificate if presented!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer 
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain 
mesos-agent[2411]: Could not load key file 
'/run/dcos/pki/tls/private/mesos-slave.key' (OpenSSL error #33558541): 
error:0200100D:system library:fopen:Permission denied
[...]
{noformat}

Get the agent pid:
{noformat}
$ ps aux |grep mesos-agent
root      2412  3.0  0.9 1169092 142968 ?      Sl   17:35   1:41 
/opt/mesosphere/packages/mesos--53caba2e50e5df71d610af4b125c0976b78b4050/bin/mesos-agent
{noformat}

Attach strace towards the agent:
{noformat}
$ sudo strace -p2412 -f -o log.log
{noformat}

Let it run for a while after creating a new service....
Check the log for details on a key open failure:
{noformat}
$ cat log.log |grep 'mesos-slave.key'
[...]
3467  open("/run/dcos/pki/tls/private/mesos-slave.key", O_RDONLY) = -1 EACCES 
(Permission denied)
3467  write(2, "Could not load key file '/run/dc"..., 148) = 148
{noformat}

Now check the log again to find out the name of the process with pid 3467.

{noformat}
$ cat  log.log | grep '^3467' |less
[...]
3467  
execve("/opt/mesosphere/active/mesos/libexec/mesos/mesos-logrotate-logger", 
["mesos-logrotate-logger", "--help=false", 
"--log_filename=/var/lib/mesos/sl"..., "--logrotate_options=rotate 9", 
"--logrotate_path=logrotate", "--max_size=2MB", "--user=nobody"], [/* 58 vars 
*/] <unfinished ...>
{noformat}

=> "mesos-logrotate-logger" it is!



> ContainerLogger runnable must not inherit the slave environment.
> ----------------------------------------------------------------
>
>                 Key: MESOS-6747
>                 URL: https://issues.apache.org/jira/browse/MESOS-6747
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Till Toenshoff
>            Priority: Blocker
>              Labels: libprocess, logger
>
> The ContainerLogger module which forks a child process named 
> "mesos-logrotate-logger" does inherit the slave's environment. Specifically 
> things like {{LIBPROCESS_SSL_....}} variables are not meant to be picked up 
> by that runnable and cause issues as soon as the owning user is not the same 
> as the one owning the agent process.
> So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if 
> that key-file is readable by the agent user (root) only, then the 
> {{mesos-logrotate-logger}} will try to read that file as well even though it 
> is being run as nobody - that action will then fail the runnable and hence 
> fail the entire task.
> {noformat}
> Could not load key file '/my/funky/key/path/key.key' (OpenSSL error 
> #33558541): error:0200100D:system library:fopen:Permission denied
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to