Hi Benjamin, Apologies for the delay. GC seem be working fine. Folders older than 2 hours are being deleted. After change of config and restart, Mesos agent took some time to delete the old folder that are around before the restart. I may have jumped the gun.
Thanks, Venkat > On Oct 30, 2017, at 1:01 PM, Benjamin Mahler <bmah...@apache.org> wrote: > > Hi Venkat, > > You're seeing that files with a modification time greater than your gc > delay of 2 hours are *not* getting deleted? Can you show a full > listing of /var/lib/mesos/slave/slaves/? > Is there more than 1 entry there? > > On Fri, Oct 27, 2017 at 8:43 AM, Venkat Morampudi <venkatmoramp...@gmail.com > <mailto:venkatmoramp...@gmail.com> >> wrote: > >> Hi Tomek, >> >> After changing GC delay to 2hrs, the existing sandbox folders that are >> older than the “Max allowed age” are not deleted. Here are the logs >> >> Logs entire before and after the change: >> >> I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max >> allowed age: 1.367499658088657days >> I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max >> allowed age: 1.368035520611667hrs >> >> Executor info from the node: >> >> >> [techops@kaiju-dcos-privateslave27 ~]$ date >> Fri Oct 27 15:41:59 UTC 2017 >> [techops@kaiju-dcos-privateslave27 ~]$ ls -l /var/lib/mesos/slave/slaves/ >> 3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6 >> f43-402c-856f-9084c0040187-002/executors/ >> total 452 >> drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0 >> drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0 >> >> >> Thanks, >> Venkat >> >>> On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <jani...@gmail.com> >> wrote: >>> >>> Low GC delay menas files will be deleted more often. I don't' think there >>> will be any performance problem but low GC means you will lose your >>> sandboxes earlier and they are useful for debugging purposes. >>> >>> pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi < >>> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com> >>> <mailto:venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>>> >>> napisał: >>> >>>> Hi Tomek, >>>> >>>> Thanks for the quick reply. After digging a bit into Mesos code we were >>>> able understand that age actually mean threshold age. Anything older >> than >>>> the “age" would be GCed. We are going to try different setting starting >>>> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of >> the >>>> going with very low GC delay? >>>> >>>> Thanks, >>>> Venkat >>>> >>>> >>>>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <jani...@gmail.com> >>>> wrote: >>>>> >>>>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) >>>>> >>>>> *Example:* >>>>> gc_delay = 7days >>>>> gc_disk_headroom = 0.1 >>>>> disk_usage = 0.8 >>>>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48 >> min >>>>> >>>>> Can you show some logs containging information about GC? >>>>> >>>>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi < >>>>> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com> >>>>> <mailto:venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>> >>>>> <mailto: >> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>>> napisał: >>>>> >>>>>> Hello, >>>>>> In our production env, we noticed that our disk filled up because one >>>>>> framework had a lot of failed/completed executors folders laying >> around. >>>>>> The folders eventually filled up the disk. >>>>>> >>>>>> >>>>>> 228M >>>>>> >>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- >> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0 >>>>>> 228M >>>>>> >>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- >> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0 >>>>>> 228M >>>>>> >>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- >> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0 >>>>>> 228M >>>>>> >>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- >> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0 >>>>>> 228M >>>>>> >>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5- >> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0 >>>>>> >>>>>> http://mesos.apache.org/documentation/latest/sandbox/# >> sandbox-lifecycle <http://mesos.apache.org/documentation/latest/sandbox/# >> <http://mesos.apache.org/documentation/latest/sandbox/#> >> sandbox-lifecycle> >>>> < >>>>>> http://mesos.apache.org/documentation/latest/sandbox/# >> sandbox-lifecycle >>>> <http://mesos.apache.org/documentation/latest/sandbox/# >> sandbox-lifecycle>> >>>>>> >>>>>> We have our lifecycle clean up set to the default which is 7days, I >>>>>> believe. >>>>>> >>>>>> We wanted to know if this is the proper way to clean up the >>>>>> failed/completed executors folders for a running framework? >>>>>> OR does the framework need to be Inactive or Completed for the garbage >>>>>> collection to work? >>>>>> OR does the framework , itself, need to deal with cleaning up its own >>>>>> executors? >>>>>> >>>>>> Bonus question: How does “gc_disk_headroom” actually work? This >> equation >>>>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 - >>>> gc_disk_headroom >>>>>> - disk usage)) >>>>>> >>>>>> Thanks, >>>>>> Venkat