[
https://issues.apache.org/jira/browse/MESOS-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729409#comment-15729409
]
Till Toenshoff commented on MESOS-6747:
---------------------------------------
Here is some information on how I got to the root of this problem.
The agent is setup for using SSL via the {{LIBPROCESS_SSL...}} variables.
The output within the agent log whenever a task is about to get run:
{noformat}
[...]
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.491056 2455
container_state_cache_impl.cpp:134] Writing container
file[/var/run/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/b8f97301-c477-49bc-87ed-1e7ea49366bf]
with endpoint[198.51.100.1:37476]
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.491147 2455 sync_util.hpp:136] Result for
ticket 1297 complete, returning value.
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.491202 2421 sync_util.hpp:83] Dispatch
result obtained for ticket 1297 after waiting <=5s: register_and_update_cache
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.534430 2424 systemd.cpp:96] Assigned child
process '16285' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.541784 2424 systemd.cpp:96] Assigned child
process '16286' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.551643 2425 linux_launcher.cpp:429]
Launching container b8f97301-c477-49bc-87ed-1e7ea49366bf and cloning with
namespaces CLONE_NEWNS
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.555986 2425 systemd.cpp:96] Assigned child
process '16288' to 'mesos_executors.slice'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.559031 2420 containerizer.cpp:1577]
Checkpointing container's forked pid 16288 to
'/var/lib/mesos/slave/meta/slaves/b79c72c2-5566-4a2f-86da-668611ee4e78-S0/frameworks/b79c72c2-5566-4a2f-86da-668611ee4e78-0001/executors/aoaoaoaoaoao.c3a39772-bca6-11e6-9988-70b3d5800001/runs/b8f97301-c477-49bc-87ed-1e7ea49366bf/pids/forked.pid'
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to
STDERR
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597594 16286 openssl.cpp:424] CA directory
path unspecified! NOTE: Set CA directory path with
LIBPROCESS_SSL_CA_DIR=<dirpath>
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597721 16286 openssl.cpp:429] Will not verify
peer certificate!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597728 16286 openssl.cpp:435] Will only
verify peer certificate if presented!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: WARNING: Logging before InitGoogleLogging() is written to
STDERR
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597862 16285 openssl.cpp:424] CA directory
path unspecified! NOTE: Set CA directory path with
LIBPROCESS_SSL_CA_DIR=<dirpath>
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597965 16285 openssl.cpp:429] Will not verify
peer certificate!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: I1207 18:58:31.597975 16285 openssl.cpp:435] Will only
verify peer certificate if presented!
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer
certificate verification
Dec 07 18:58:31 test-c0239131-89be-41fa-8193-17d9e67761d3.localdomain
mesos-agent[2411]: Could not load key file
'/run/dcos/pki/tls/private/mesos-slave.key' (OpenSSL error #33558541):
error:0200100D:system library:fopen:Permission denied
[...]
{noformat}
Get the agent pid:
{noformat}
$ ps aux |grep mesos-agent
root 2412 3.0 0.9 1169092 142968 ? Sl 17:35 1:41
/opt/mesosphere/packages/mesos--53caba2e50e5df71d610af4b125c0976b78b4050/bin/mesos-agent
{noformat}
Attach strace towards the agent:
{noformat}
$ sudo strace -p2412 -f -o log.log
{noformat}
Let it run for a while after creating a new service....
Check the log for details on a key open failure:
{noformat}
$ cat log.log |grep 'mesos-slave.key'
[...]
3467 open("/run/dcos/pki/tls/private/mesos-slave.key", O_RDONLY) = -1 EACCES
(Permission denied)
3467 write(2, "Could not load key file '/run/dc"..., 148) = 148
{noformat}
Now check the log again to find out the name of the process with pid 3467.
{noformat}
$ cat log.log | grep '^3467' |less
[...]
3467
execve("/opt/mesosphere/active/mesos/libexec/mesos/mesos-logrotate-logger",
["mesos-logrotate-logger", "--help=false",
"--log_filename=/var/lib/mesos/sl"..., "--logrotate_options=rotate 9",
"--logrotate_path=logrotate", "--max_size=2MB", "--user=nobody"], [/* 58 vars
*/] <unfinished ...>
{noformat}
=> "mesos-logrotate-logger" it is!
> ContainerLogger runnable must not inherit the slave environment.
> ----------------------------------------------------------------
>
> Key: MESOS-6747
> URL: https://issues.apache.org/jira/browse/MESOS-6747
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Till Toenshoff
> Priority: Blocker
> Labels: libprocess, logger
>
> The ContainerLogger module which forks a child process named
> "mesos-logrotate-logger" does inherit the slave's environment. Specifically
> things like {{LIBPROCESS_SSL_....}} variables are not meant to be picked up
> by that runnable and cause issues as soon as the owning user is not the same
> as the one owning the agent process.
> So if the agent has an SSL key setup via {{LIBPROCESS_SSL_KEY_FILE}} and if
> that key-file is readable by the agent user (root) only, then the
> {{mesos-logrotate-logger}} will try to read that file as well even though it
> is being run as nobody - that action will then fail the runnable and hence
> fail the entire task.
> {noformat}
> Could not load key file '/my/funky/key/path/key.key' (OpenSSL error
> #33558541): error:0200100D:system library:fopen:Permission denied
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)