[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jamie Briant updated MESOS-6118:
--------------------------------
    Description: 
I have a framework which schedules thousands of short running (a few seconds to 
a few minutes) of tasks, over a period of several minutes. In 1.0.1, the slave 
process will crash every few minutes (with systemd restarting it).

Crash is:

Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
fs.cpp:140] Check failed: !visitedParents.contains(parentId)
Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: ***


  was:I have a framework which schedules thousands of short running (a few 
seconds to a few minutes) of tasks, over a period of several minutes. In 1.0.1, 
the slave process will crash every few minutes (with systemd restarting it).


> Agent crashes under load
> ------------------------
>
>                 Key: MESOS-6118
>                 URL: https://issues.apache.org/jira/browse/MESOS-6118
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 1.0.1
>         Environment: Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: 
> I0901 20:49:10.522084  1204 main.cpp:243] Build: 2016-08-26 23:06:27 by centos
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.522236  
> 1204 main.cpp:244] Version: 1.0.1
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.522251  
> 1204 main.cpp:247] Git tag: 1.0.1
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.522258  
> 1204 main.cpp:251] Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.526270  
> 1204 logging.cpp:194] INFO level logging started!
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.532732  
> 1204 systemd.cpp:237] systemd version `219` detected
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.532768  
> 1204 main.cpp:342] Inializing systemd state
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.533128  
> 1204 systemd.cpp:304] Created systemd slice: 
> `/run/systemd/system/mesos_executors.slice`
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.583580  
> 1204 systemd.cpp:325] Started systemd slice `mesos_executors.slice`
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.877017  
> 1204 containerizer.cpp:196] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> Sep 01 20:49:10 ip-10-254-192-99 mesos-slave[1227]: I0901 20:49:10.883193  
> 1204 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer 
> hierarchy for the Linux launcher
>            Reporter: Jamie Briant
>         Attachments: slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to