Hi Venkat, You're seeing that files with a modification time greater than your gc delay of 2 hours are *not* getting deleted? Can you show a full listing of /var/lib/mesos/slave/slaves/? Is there more than 1 entry there?
On Fri, Oct 27, 2017 at 8:43 AM, Venkat Morampudi <venkatmoramp...@gmail.com > wrote: > Hi Tomek, > > After changing GC delay to 2hrs, the existing sandbox folders that are > older than the “Max allowed age” are not deleted. Here are the logs > > Logs entire before and after the change: > > I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max > allowed age: 1.367499658088657days > I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max > allowed age: 1.368035520611667hrs > > Executor info from the node: > > > [techops@kaiju-dcos-privateslave27 ~]$ date > Fri Oct 27 15:41:59 UTC 2017 > [techops@kaiju-dcos-privateslave27 ~]$ ls -l /var/lib/mesos/slave/slaves/ > 3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6 > f43-402c-856f-9084c0040187-002/executors/ > total 452 > drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0 > drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0 > drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0 > drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0 > drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0 > drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0 > drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0 > drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0 > drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0 > drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0 > drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0 > drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0 > drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0 > drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0 > drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0 > > > Thanks, > Venkat > > > On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <jani...@gmail.com> > wrote: > > > > Low GC delay menas files will be deleted more often. I don't' think there > > will be any performance problem but low GC means you will lose your > > sandboxes earlier and they are useful for debugging purposes. > > > > pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi < > > venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>> napisał: > > > >> Hi Tomek, > >> > >> Thanks for the quick reply. After digging a bit into Mesos code we were > >> able understand that age actually mean threshold age. Anything older > than > >> the “age" would be GCed. We are going to try different setting starting > >> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of > the > >> going with very low GC delay? > >> > >> Thanks, > >> Venkat > >> > >> > >>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <jani...@gmail.com> > >> wrote: > >>> > >>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) > >>> > >>> *Example:* > >>> gc_delay = 7days > >>> gc_disk_headroom = 0.1 > >>> disk_usage = 0.8 > >>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48 > min > >>> > >>> Can you show some logs containging information about GC? > >>> > >>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi < > >>> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com> <mailto: > venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>>> napisał: > >>> > >>>> Hello, > >>>> In our production env, we noticed that our disk filled up because one > >>>> framework had a lot of failed/completed executors folders laying > around. > >>>> The folders eventually filled up the disk. > >>>> > >>>> > >>>> 228M > >>>> > >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- > S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0 > >>>> 228M > >>>> > >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- > S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0 > >>>> 228M > >>>> > >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- > S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0 > >>>> 228M > >>>> > >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- > S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0 > >>>> 228M > >>>> > >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- > S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0 > >>>> > >>>> http://mesos.apache.org/documentation/latest/sandbox/# > sandbox-lifecycle <http://mesos.apache.org/documentation/latest/sandbox/# > sandbox-lifecycle> > >> < > >>>> http://mesos.apache.org/documentation/latest/sandbox/# > sandbox-lifecycle > >> <http://mesos.apache.org/documentation/latest/sandbox/# > sandbox-lifecycle>> > >>>> > >>>> We have our lifecycle clean up set to the default which is 7days, I > >>>> believe. > >>>> > >>>> We wanted to know if this is the proper way to clean up the > >>>> failed/completed executors folders for a running framework? > >>>> OR does the framework need to be Inactive or Completed for the garbage > >>>> collection to work? > >>>> OR does the framework , itself, need to deal with cleaning up its own > >>>> executors? > >>>> > >>>> Bonus question: How does “gc_disk_headroom” actually work? This > equation > >>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 - > >> gc_disk_headroom > >>>> - disk usage)) > >>>> > >>>> Thanks, > >>>> Venkat > >