[ 
https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839419#comment-16839419
 ] 

Joseph Wu commented on MESOS-9749:
----------------------------------

The agent ends up in a bad state because the stdout/err pipe gets filled, and 
therefore starts to block threads.  This can lead to unpredictable results 
(since we aren't sure which threads are blocked by IO).

If the logs are not written directly to journald, then you won't need a restart 
of the agent.  It should remain functional during the time journald is down.

Of course, restarting the agent is still an option.

> mesos agent logging hangs upon systemd-journald restart
> -------------------------------------------------------
>
>                 Key: MESOS-9749
>                 URL: https://issues.apache.org/jira/browse/MESOS-9749
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.7.2
>         Environment: Running on centos 7.4.1708, systemd  219 (probably 
> heavily patched by centos)
> mesos-agent command:
> {code}
> /usr/sbin/mesos-slave \
>  
> --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1'
>  \
>  --cgroups_enable_cfs \
>  --cgroups_hierarchy='/sys/fs/cgroup' \
>  --cgroups_net_cls_primary_handle='0xC370' \
>  --container_logger='org_apache_mesos_LogrotateContainerLogger' \
>  --containerizers='mesos' \
>  --credential='file:///etc/mesos-chef/slave-credential' \
>  
> --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}'
>  \
>  --docker_registry='https://filer-docker-registry.prod.crto.in/' \
>  --docker_store_dir='/var/opt/mesos/store/docker' \
>  --enforce_container_disk_quota \
>  
> --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}'
>  \
>  --executor_registration_timeout='5mins' \
>  --fetcher_cache_dir='/var/opt/mesos/cache' \
>  --fetcher_cache_size='2GB' \
>  --hooks='com_criteo_mesos_CommandHook' \
>  --image_providers='docker' \
>  --image_provisioner_backend='copy' \
>  
> --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator'
>  \
>  --logging_level='INFO' \
>  
> --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos'
>  \
>  --modules='file:///etc/mesos-chef/slave-modules.json' \
>  --port=5051 \
>  --recover='reconnect' \
>  --resources='file:///etc/mesos-chef/custom_resources.json' \
>  --strict \
>  --work_dir='/var/opt/mesos' \
>  --xfs_kill_containers \
>  --xfs_project_range='[5000-500000]'
> {code}
>            Reporter: Gregoire Seux
>            Priority: Minor
>              Labels: foundations
>
> When mesos agent is launched through systemd, a restart of systemd-journald 
> service makes mesos agent logging hang (no more output).. The process itself 
> seems to work fine (we can query state via http for instance).
> A restart of mesos-agent corrects the issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to