[
https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839408#comment-16839408
]
Joseph Wu commented on MESOS-9749:
----------------------------------
The default behavior of Mesos's logging is to write to stdout/stderr. When
launching via systemd, this means you are writing to journald. And if journald
is restarted, the pipe between the agent and journald would be broken. These
sorts of broken pipes usually terminate the agent, but it seems to be different
in systemd's case.
See also: [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771122]
There are a variety of ways to get around this, basically involving writing
logs to some other location:
---
h2. Built-in solutions
Mesos lets you write stdout/stderr to disk instead. If you specify the
{{--log_dir}} flag, Mesos will leverage glog's log writing behavior, which has
some form of log rotation built in. But unfortunately, this does not seem to
bound the size of logs on disk, so you'd end up writing a script or such to
clean up logs.
Besides that, you may modify your service file to write to something besides
journald, such as syslog, or a file.
https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Logging%20and%20Standard%20Input/Output
h2. Other solutions
By the looks of your agent configuration, you are not averse to deploying
modules ({{--modules='file:///etc/mesos-chef/slave-modules.json'}}). In this
case, you have some other options.
DC/OS uses a {{LogSink}} module (which is a Mesos Anonymous module implementing
a glog module) to pipe logs to file, which are then rotated by another timer.
https://github.com/dcos/dcos-mesos-modules/tree/master/logsink
If the goal is to get logs into journald, across journald restarts, this is
also possible with a {{LogSink}}. This would entail using the journald C API,
like {{sd_journal_send}}. I believe this is capable of reconnecting after
journald restarts.
https://www.freedesktop.org/software/systemd/man/sd_journal_print.html
> mesos agent logging hangs upon systemd-journald restart
> -------------------------------------------------------
>
> Key: MESOS-9749
> URL: https://issues.apache.org/jira/browse/MESOS-9749
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.7.2
> Environment: Running on centos 7.4.1708, systemd 219 (probably
> heavily patched by centos)
> mesos-agent command:
> {code}
> /usr/sbin/mesos-slave \
>
> --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1'
> \
> --cgroups_enable_cfs \
> --cgroups_hierarchy='/sys/fs/cgroup' \
> --cgroups_net_cls_primary_handle='0xC370' \
> --container_logger='org_apache_mesos_LogrotateContainerLogger' \
> --containerizers='mesos' \
> --credential='file:///etc/mesos-chef/slave-credential' \
>
> --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}'
> \
> --docker_registry='https://filer-docker-registry.prod.crto.in/' \
> --docker_store_dir='/var/opt/mesos/store/docker' \
> --enforce_container_disk_quota \
>
> --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}'
> \
> --executor_registration_timeout='5mins' \
> --fetcher_cache_dir='/var/opt/mesos/cache' \
> --fetcher_cache_size='2GB' \
> --hooks='com_criteo_mesos_CommandHook' \
> --image_providers='docker' \
> --image_provisioner_backend='copy' \
>
> --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator'
> \
> --logging_level='INFO' \
>
> --master='zk://mesos:[email protected]:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos'
> \
> --modules='file:///etc/mesos-chef/slave-modules.json' \
> --port=5051 \
> --recover='reconnect' \
> --resources='file:///etc/mesos-chef/custom_resources.json' \
> --strict \
> --work_dir='/var/opt/mesos' \
> --xfs_kill_containers \
> --xfs_project_range='[5000-500000]'
> {code}
> Reporter: Gregoire Seux
> Priority: Minor
> Labels: foundations
>
> When mesos agent is launched through systemd, a restart of systemd-journald
> service makes mesos agent logging hang (no more output).. The process itself
> seems to work fine (we can query state via http for instance).
> A restart of mesos-agent corrects the issue.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)