I commented on the JIRA. On Thu, Apr 30, 2020 at 3:02 PM Charles-François Natali <cf.nat...@gmail.com> wrote:
> Thanks Vinod. > > Yes, I understand that Mesos assumes it's the only process managing > resources, makes sense. > Looking at the code and testing shows the agent reports as available > memory the total memory of the host, minus 1GB (or half the total > memory if the total memory is below 2GB) > ( > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152 > ). > So basically it means that if assumes that the OS doesn't use more > than 1GB. I guess if it's not the case one can just specify the memory > manually to the agent, so that's fine. > > Actually the reason I was wondering about this is because we recently > had a problem where containers couldn't be destroyed because of tasks > stuck in uninterruptible (D) state, which caused the memory to be > basically leaked, i.e. the agent was advertising the memory free while > it was still being used by the stuck processes. We ran into a similar > issue with GPUs - it's a known issue > https://issues.apache.org/jira/browse/MESOS-8038 - I posted an > analysis and potential fix, it'd be great if someone could have a look > :). > > Cheers, > > Charles > > Le jeu. 30 avr. 2020 à 15:36, Vinod Kone <vinodk...@apache.org> a écrit : > > > > Mesos assumes that it is the only process managing resources of a box > (cpu, > > mem, disk). So if you have out of band processes using up resources it > > won't be reflected in the resource offers and the box can be > overcommitted. > > There is no runtime periodic check of available resources, it's only > > calculated once at startup. > > > > Resource detection logic is here: > > > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65 > > > > On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali < > cf.nat...@gmail.com> > > wrote: > > > > > Hi, > > > > > > Could someone point me to some code/documentation explaining how the > > > agent available memory is computed, and when it is refreshed? > > > > > > For example, if I have an agent started, with some outstanding offers, > > > and I then start a process - not as a task managed by Mesos, but as an > > > external process which just allocates a lot of memory - and touches > > > it, not just committed - I can see the machine available memory go > > > down (as reported by free, and MemAvailable in /proc/meminfo), but the > > > agent doesn't rescind any offer, and never seems to actually refresh > > > it - event after starting/stopping tasks. > > > > > > Cheers, > > > > > > Charles > > > >