Hi Venkat,

You're seeing that files with a modification time greater than your gc
delay of 2 hours are *not* getting deleted? Can you show a full
listing of /var/lib/mesos/slave/slaves/?
Is there more than 1 entry there?

On Fri, Oct 27, 2017 at 8:43 AM, Venkat Morampudi <venkatmoramp...@gmail.com
> wrote:

> Hi Tomek,
>
> After changing GC delay to 2hrs, the existing sandbox folders that are
> older than the “Max allowed age” are not deleted. Here are the logs
>
> Logs entire before and after the change:
>
> I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max
> allowed age: 1.367499658088657days
> I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max
> allowed age: 1.368035520611667hrs
>
> Executor info from the node:
>
>
> [techops@kaiju-dcos-privateslave27 ~]$ date
> Fri Oct 27 15:41:59 UTC 2017
> [techops@kaiju-dcos-privateslave27 ~]$ ls -l /var/lib/mesos/slave/slaves/
> 3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6
> f43-402c-856f-9084c0040187-002/executors/
> total 452
> drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0
> drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0
> drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0
> drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0
> drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0
> drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0
> drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0
> drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0
> drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0
> drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0
> drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0
> drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0
> drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0
> drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0
> drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0
>
>
> Thanks,
> Venkat
>
> > On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <jani...@gmail.com>
> wrote:
> >
> > Low GC delay menas files will be deleted more often. I don't' think there
> > will be any performance problem but low GC means you will lose your
> > sandboxes earlier and they are useful for debugging purposes.
> >
> > pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi <
> > venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>> napisał:
> >
> >> Hi Tomek,
> >>
> >> Thanks for the quick reply. After digging a bit into Mesos code we were
> >> able understand that age actually mean threshold age. Anything older
> than
> >> the “age" would be GCed. We are going to try different setting starting
> >> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of
> the
> >> going with very low GC delay?
> >>
> >> Thanks,
> >> Venkat
> >>
> >>
> >>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <jani...@gmail.com>
> >> wrote:
> >>>
> >>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage))
> >>>
> >>> *Example:*
> >>> gc_delay = 7days
> >>> gc_disk_headroom = 0.1
> >>> disk_usage = 0.8
> >>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48
> min
> >>>
> >>> Can you show some logs containging information about GC?
> >>>
> >>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi <
> >>> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com> <mailto:
> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>>> napisał:
> >>>
> >>>> Hello,
> >>>> In our production env, we noticed that our disk filled up because one
> >>>> framework had a lot of failed/completed executors folders laying
> around.
> >>>> The folders eventually filled up the disk.
> >>>>
> >>>>
> >>>> 228M
> >>>>
> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0
> >>>> 228M
> >>>>
> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0
> >>>> 228M
> >>>>
> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0
> >>>> 228M
> >>>>
> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0
> >>>> 228M
> >>>>
> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0
> >>>>
> >>>> http://mesos.apache.org/documentation/latest/sandbox/#
> sandbox-lifecycle <http://mesos.apache.org/documentation/latest/sandbox/#
> sandbox-lifecycle>
> >> <
> >>>> http://mesos.apache.org/documentation/latest/sandbox/#
> sandbox-lifecycle
> >> <http://mesos.apache.org/documentation/latest/sandbox/#
> sandbox-lifecycle>>
> >>>>
> >>>> We have our lifecycle clean up set to the default which is 7days, I
> >>>> believe.
> >>>>
> >>>> We wanted to know if this is the proper way to clean up the
> >>>> failed/completed executors folders for a running framework?
> >>>> OR does the framework need to be Inactive or Completed for the garbage
> >>>> collection to work?
> >>>> OR does the framework , itself, need to deal with cleaning up its own
> >>>> executors?
> >>>>
> >>>> Bonus question: How does “gc_disk_headroom” actually work? This
> equation
> >>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 -
> >> gc_disk_headroom
> >>>> - disk usage))
> >>>>
> >>>> Thanks,
> >>>> Venkat
>
>

Reply via email to