Need agent and executor logs to diagnose. Can you share them? Thanks, Vinod
> On Aug 1, 2019, at 6:05 AM, Jorge Machado <jom...@me.com.invalid> wrote: > > Hi Guys, > > I was reading about agent restarts on > http://mesos.apache.org/documentation/latest/agent-recovery/ > <http://mesos.apache.org/documentation/latest/agent-recovery/> > From what I understood, If I had a task running and we restart the > mesos-agent I should not loose any task running. > This is not the case for systemctl (or with service command) from ubuntu > 18.04. Our Framework has checkpointing active... > > My config: > > [Unit] > Description=Mesos Agent > After=network.target > Wants=network.target > > [Service] > Environment=LIBPROCESS_SSL_ENABLED=true > Environment=LIBPROCESS_SSL_SUPPORT_DOWNGRADE=false > Environment=LIBPROCESS_SSL_CIPHERS=AES128-SHA:AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA > Environment=LIBPROCESS_SSL_KEY_FILE=/etc/ssl/private/server_2048.key > Environment=LIBPROCESS_SSL_CERT_FILE=/etc/ssl/server.crt > Environment=LIBPROCESS_SSL_CA_FILE=/etc/pki/trust/anchors/it4ad.pem > > ExecStart=/usr/local/sbin/mesos-agent \ > --master=<zookeeper> \ > --work_dir=/data/mesos/work \ > --log_dir=/var/log/mesos \ > --executor_registration_timeout=20mins \ > --executor_environment_variables=file:///etc/mesos/executor_envs.json \ > --resources=file:///etc/mesos/resources.txt \ > --image_gc_config=file:///etc/mesos/image-gc-config.json \ > > --isolation=cgroups/cpu,cgroups/mem,cgroups/devices,filesystem/linux,gpu/nvidia,docker/runtime,namespaces/pid,namespaces/ipc > \ > --image_providers=docker \ > --docker_store_dir=/data/mesos/store/docker \ > --gc_delay=3weeks \ > --attributes=<attr> > > KillMode=control-cgroup > Restart=always > RestartSec=20 > LimitNOFILE=infinity > CPUAccounting=true > MemoryAccounting=true > TasksMax=infinity > > [Install] > WantedBy=multi-user.target > > > > Any tipp ? thx > > > > Jorge Machado > www.jmachado.me > > > > >