On 08/05/2015 11:34 AM, Helmut Wollmersdorfer wrote:
Hi,
in the past (3 years ago AFAIR) I experienced oom-killer problems with too
small configurations of memory in Xen Dom0. This can cause an unreachable node,
if ssh is oom-killed.
These are multiple problems. One is running out of memory, which can
happen for various reasons, another one is that the system is
misconfigured to allow random processes to be killed in the case that
the system runs out of memory.
Linux's default configuration overcommits memory, which means that it
fulfills application requests for memory even when all the memory is
already reserved for other processes, based on the assumption that most
processes will probably not use all the memory they could theoretically
need (e.g., if every process got a copy of all its copy-on-write memory
pages, etc.)
Sometimes this assumption works out, sometimes it doesn't, and that's
when the oom-killer starts to kill random processes.
Quite obviously, from an availability point of view, it would be better
to reconfigure the Linux kernel so that it denies application requests
for memory as soon as it can no longer guarantee that there is enough
free memory even for the case that all processes actually use as much
memory as they could theoretically.
You can do that by setting
vm.overcommit_memory=2
vm.overcommit_ratio=n
...where n is some rather high percentage of random access memory that
will be made availabe to applications; probably something in the range
of 90 to 99, depending on the hard- and software configuration.
This WILL use more memory, but it will also improve the system's
robustness regarding memory shortage situations. For this feature to be
useful, swap space must be configured (so that Linux can still grant
more than the RAM's size to applications, simply by reserving the
required amount of swap space).
Migrating drbd-devices and VMs to the new cluster it now seems to touch the
limits as it begins to swap a little bit.
I still need to migrate 3 VMs (6 drbd-devices) to the new cluster.
Will it work without problems, without downtime for repairing the
misconfiguration?
I guess you will have to restart the dom0 after assigning more memory to
it. I am not sure whether you can add memory to it on-the-fly.
Regarding the memory management settings mentioned above, those can be
changed on the fly, provided that enough memory is free at the time of
the change, and provided the change is made in the right order.
Re: [DRBD-user] How much memory does a drbd-device need?
Since the bitmap is always kept in memory while the resource is online,
every resource requires somewhat more than the bitmap's size in memory.
The bitmap's size is approximately 32 kilobytes of bitmap data per
gigabyte of replicated storage (= 32 megabytes per terabyte).
That, and then whatever the buffer sizes for the resource are (as
configured in its configuration file), plus some internal datastructures
(but that is a small factor compared to the others).
TIA
Helmut Wollmersdorfer
best regards,
Robert
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user