Re: java, mlockall and high cpu kswapd

[email protected] Sat, 18 Oct 2014 08:16:53 -0700

vm.swappiness=0 means preventing the swapping of anonymous pages and also
process pages, but it does not mean disabling swapping, it means delaying
swap until running out of RAM. The kernel kswapd will go high on CPU
because with swappiness=0 there is no difference between any memory pages
to find for swapping, it has simply a very large LRU page list to traverse.
So I hesitate to see where an old long standing linux bug is (would love a
pointer though)


Instead of "vm.swappiness=0"  the command "swapoff -a" plus erasing the
swap partitions (having lots of spare RAM available!) should work better
for the purpose of disabling swap completely.

Jörg


On Sat, Oct 18, 2014 at 10:27 AM, Michael deMan (ES) <
[email protected]> wrote:

> Quick update,
> As much for myself or if anybody else comes across this problem in the
> future.
> We moved both master and query nodes to use 70% of our calculated
> ‘usable_memory’.
> Things seem stable now.
> We are still concerned about being able to maximize java heap size on our
> query (aka coordinator) nodes.  Master nodes, not such a big deal.
> We also discovered that our Ops team had vm.swappiness=0 while we also
> were running java with mlockall, and which was an unexpected new scenario.
> At this time my best guess is that we are just triggering the same old
> long standing linux bug with thrashing on memory page compression vs. disk
> IO.
> Our next step will be to just run java with mlockall and with
> vm.swappiness=1, them from there start trying to use memory more
> aggressively again.
>
>
> On Oct 9, 2014, at 12:24 PM, Michael deMan (ES) <[email protected]>
> wrote:
>
> Hi Jörg,
>
> We tune java heap size against what we think is ‘usable’ memory, not
> system memory, specifically to reserve space for other processes like the
> java app itself, chef, splunk, etc.
>
> The formula we have right now is:
> - masters: "java_min_heap_pct_of_usable_memory": 100
> - data: "java_min_heap_pct_of_usable_memory": 50
> - query:" java_min_heap_pct_of_usable_memory": 100
> where: usable_memory_mb = ((host_memory_mb - 600) * 0.9).floor
>
> I have been thinking the next logical step for us is to put our
> master/query nodes back at 50% heap size usage, pound them with load tests,
> wait and watch.  If nothing else, then we are back in alignment with ES
> best practices guidelines, and if the problem goes away we have it solved,
> and if it stays around we can dig back into it.
>
> Thanks for the help,
> - Mike
>
>
>
>
> On Oct 9, 2014, at 11:01 AM, [email protected] wrote:
>
> The thought of "big disk caching" is correct, but you should be aware this
> is a simplification of the concrete situation.
>
> Elasticsearch uses much more RAM than the configured value - you must
> leave space for internal "direct" buffers, stacks, classes, libraries etc.
> and also for the kernel and the OS to live.
>
> So if you configure 2908m for heap plus enable mlockall, and have just 4 G
> RAM, while the kernel and OS processes need also space, then you will have
> severe RAM congestion.
>
> Rules of thumb:
>
> - set ES heap size to around 50% of total RAM but not less than 1 GB and
> not more than 32 GB (due to JVM garbage collector performance)
>
> - if the RAM left is less than 2GB *and* mlockall is enabled, the risk of
> RAM contention is high, in this case, decrease ES heap size until 2GB RAM
> is available *or* set ES direct memory allocation limit
>
> -  if there are other processes running, do not use "total RAM" but
> "available RAM" to find out the maximum ES heap size, to ensure other
> processes can continue to run without getting under memory pressure (it is
> recommended to run ES *without* any other processes)
>
> - the total process space of ES might increase significantly over time if
> there is no configuration limit set for direct memory buffer allocation
>
> Jörg
>
>
> On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) <
> [email protected]> wrote:
>
>> Also,
>>
>> For our data nodes we follow best practices with 50% of memory for java
>> heap, while for our master and query nodes we allocate a higher percentage
>> with the thought that they really do not need big disk caching.  Could that
>> be our problem?
>>
>> In addition, the systems actually are not swapping - no swap in use, just
>> the kswapd process runs away at 100% cpu.
>>
>> We are on:
>>
>> java version "1.7.0_17"
>> Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>
>> elastic search 1.3.2.
>>
>> Thanks in an advance for any pointers, hopefully somebody has seen this
>> before and knows the quick fix.
>>
>> - Mike
>>
>>
>> On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) <[email protected]>
>> wrote:
>>
>> Hi All,
>>
>> This is a bit off topic, but we only see this on some of our elastic
>> search hosts, and it is also the only place where we enable mlockall for
>> java which is our understanding is a strongly recommended best practice.
>>
>> Basically we from time to time see kswapd run away at 100% on a single
>> core.
>>
>> It seems to hit our master nodes more frequently, and they also have the
>> least amount of memory.
>> masters are:
>> CentOS 6.4
>> 4GB RAM
>> 4GB swap
>> ES_HEAP_SIZE=2908m
>>
>>
>>
>> Does anybody know much about this and how to prevent it?
>> We have hunted google groups, but have not really found the magic bullet.
>>
>> We have considered turning off swap and seeing what happens in the lab
>> but prefer not to do that unless it is well known as the correct solution.
>>
>> Thanks,
>> - Mike
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com
>> <https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com
>> <https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com
> <https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com
> <https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEan17Ya6zLVP0WSnXgLUh5eJNR5Ykp5wSTunU9XN%2BqHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: java, mlockall and high cpu kswapd

Reply via email to