Re: Elasticsearch with bootstrap.mlockall causes server crash

KingBob Mon, 10 Feb 2014 02:53:13 -0800

Hello,

Thanks to everyone for the replies.


Apologies if there was ambiguity in my original post, allow me to clarify a 
few things:

VMware: VMware vSphere 5.1 (ESXi-5.1.0-20121004001-standard)

It was the entire virtual machine that stopped responding (not VMware or 
the physical server ESXi runs on).

Due to the fact that SLES (on the VM) was locking up entirely, it wasn't 
writing anything useful to the system logs, they just stopped mid-stream.

As it happens, I have resolved the issue -- I needed to add a reservation 
policy for the Memory to prevent VMware's Balloon Driver messing with the 
contents of my VMs memory.

Right Click VM -> Edit Settings -> Resources tab -> Memory -> enable 
"Reserve all guest memory (All locked)"

Hope this helps someone else out there :)

Many thanks,
KB.

On Friday, February 7, 2014 12:47:08 PM UTC, KingBob wrote:
>
> Hello,
>
> I hope someone can assist with this -- whenever I start Elasticsearch with 
> the bootstrap.mlockall option enabled my entire server stops responding 
> after a minute or two.
>
> There is nothing obvious in the log file, even with DEBUG enabled.  I 
> can't find any errors in my system logs either, so I'm at a bit of a loss 
> as to what the issue is.
>
> When I disable the bootstrap.mlockall option ES runs fine.
>
> JRE -- jre1.7.0_51
> ES -- elasticsearch-0.90.11
> OS -- SLES11SP3 (VMware)
> RAM -- 8GB
> PAGING -- 5GB
> ES_HEAP_SIZE -- 4GB (50% actual)
>
> I'm building the first of a 3-node cluster.
> I intended to get everything working here, then clone the VM to create the 
> remaining 2 nodes.
>
> The "elastic" user has the following ulimits:
>
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 63931
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) 6965324
> open files                      (-n) 65535
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) unlimited
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 63931
> virtual memory          (kbytes, -v) 6960400
> file locks                      (-x) unlimited
>
> Here is a DEBUG log from startup to hang (at which point I have to 
> hard-reset the server):
>
> [2014-02-07 00:49:44,463][INFO ][node                     ] [log-els01] 
> version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
> [2014-02-07 00:49:44,463][INFO ][node                     ] [log-els01] 
> initializing ...
> [2014-02-07 00:49:44,464][DEBUG][node                     ] [log-els01] 
> using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data 
> [[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work 
> [/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
> [2014-02-07 00:49:44,472][INFO ][plugins                  ] [log-els01] 
> loaded [], sites []
> [2014-02-07 00:49:44,483][DEBUG][common.compress.lzf      ] using 
> [UnsafeChunkDecoder] decoder
> [2014-02-07 00:49:44,489][DEBUG][env                      ] [log-els01] 
> using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]], 
> local_node_id [0]
> [2014-02-07 00:49:45,587][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [generic], type [cached], keep_alive [30s]
> [2014-02-07 00:49:45,595][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [index], type [fixed], size [1], queue_size [200]
> [2014-02-07 00:49:45,597][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [get], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [search], type [fixed], size [3], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [management], type [scaling], min [1], size [5], 
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive 
> [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive 
> [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [refresh], type [scaling], min [1], size [1], 
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [warmer], type [scaling], min [1], size [1], 
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [snapshot], type [scaling], min [1], size [1], 
> keep_alive [5m]
> [2014-02-07 00:49:45,600][DEBUG][threadpool               ] [log-els01] 
> creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
> [2014-02-07 00:49:45,638][DEBUG][transport.netty          ] [log-els01] 
> using worker_count[2], port[9300-9400], bind_host[null], 
> publish_host[null], compress[false], connect_timeout[30s], 
> connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
> [2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast] [log-els01] 
> using group [224.2.2.4], with port [54328], ttl [3], and address [null]
> [2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01] 
> using initial hosts [], with concurrent_connects [10]
> [2014-02-07 00:49:45,668][DEBUG][discovery.zen            ] [log-els01] 
> using ping.timeout [3s], master_election.filter_client [true], 
> master_election.filter_data [false]
> [2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect      ] [log-els01] 
> using minimum_master_nodes [2]
> [2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd         ] [log-els01] 
> [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
> [2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd         ] [log-els01] 
> [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
> [2014-02-07 00:49:45,704][DEBUG][monitor.jvm              ] [log-els01] 
> enabled [true], last_gc_enabled [false], interval [1s], gc_threshold 
> [{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000, 
> debugThreshold=2000}, default=GcThreshold{name='default', 
> warnThreshold=10000, infoThreshold=5000, debugThreshold=2000}, 
> young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700, 
> debugThreshold=400}}]
> [2014-02-07 00:49:46,212][DEBUG][monitor.os               ] [log-els01] 
> Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with 
> refresh_interval [1s]
> [2014-02-07 00:49:46,217][DEBUG][monitor.process          ] [log-els01] 
> Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be] 
> with refresh_interval [1s]
> [2014-02-07 00:49:46,223][DEBUG][monitor.jvm              ] [log-els01] 
> Using refresh_interval [1s]
> [2014-02-07 00:49:46,228][DEBUG][monitor.network          ] [log-els01] 
> Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc] 
> with refresh_interval [5s]
> [2014-02-07 00:49:46,235][DEBUG][monitor.network          ] [log-els01] 
> net_infofailed to get Network Interface Info [log-els01: log-els01: Name or 
> service not known]
> [2014-02-07 00:49:46,238][DEBUG][monitor.fs               ] [log-els01] 
> Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with 
> refresh_interval [1s]
> [2014-02-07 00:49:46,570][DEBUG][indices.store            ] [log-els01] 
> using indices.store.throttle.type [MERGE], with 
> index.store.throttle.max_bytes_per_sec [20mb]
> [2014-02-07 00:49:46,578][DEBUG][cache.memory             ] [log-els01] 
> using bytebuffer cache with small_buffer_size [1kb], large_buffer_size 
> [1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
> [2014-02-07 00:49:46,583][DEBUG][script                   ] [log-els01] 
> using script cache with max_size [500], expire [null]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using node_concurrent_recoveries [2], 
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with 
> [indices_all_active]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,591][DEBUG][gateway.local            ] [log-els01] 
> using initial_shards [quorum], list_timeout [30s]
> [2014-02-07 00:49:46,735][DEBUG][indices.recovery         ] [log-els01] 
> using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size 
> [512kb], translog_size [512kb], translog_ops [1000], and compress [true]
> [2014-02-07 00:49:46,801][DEBUG][http.netty               ] [log-els01] 
> using max_chunk_size[8kb], max_header_size[8kb], 
> max_initial_line_length[4kb], max_content_length[100mb], 
> receive_predictor[512kb->512kb]
> [2014-02-07 00:49:46,805][DEBUG][indices.memory           ] [log-els01] 
> using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb], 
> max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
> [2014-02-07 00:49:46,806][DEBUG][indices.cache.filter     ] [log-els01] 
> using [node] weighted filter cache with size [20%], actual_size [817.5mb], 
> expire [null], clean_interval [1m]
> [2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache  ] [log-els01] 
> using size [-1] [-1b], expire [null]
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01] 
> using gateway.local.auto_import_dangled [YES], with 
> gateway.local.dangling_timeout [2h]
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01] 
> took 0s to load state
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01] 
> took 0s to load started shards state
> [2014-02-07 00:49:46,824][DEBUG][bulk.udp                 ] [log-els01] 
> using enabled [false], host [null], port [9700-9800], bulk_actions [1000], 
> bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using node_concurrent_recoveries [2], 
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with 
> [indices_all_active]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using node_concurrent_recoveries [2], 
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with 
> [indices_all_active]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider] 
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,836][INFO ][node                     ] [log-els01] 
> initialized
> [2014-02-07 00:49:46,836][INFO ][node                     ] [log-els01] 
> starting ...
> [2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil] 
> Using select timeout of 500
> [2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil] 
> Epoll-bug workaround enabled = false
> [2014-02-07 00:49:46,890][DEBUG][transport.netty          ] [log-els01] 
> Bound to address [/0:0:0:0:0:0:0:0:9300]
> [2014-02-07 00:49:46,893][INFO ][transport                ] [log-els01] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 172.22.123.224:9300]}
> [2014-02-07 00:49:49,912][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:52,913][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:55,914][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:58,915][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:01,916][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:04,918][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:07,919][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:10,920][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:13,921][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:16,900][WARN ][discovery                ] [log-els01] 
> waited for 30s and no initial state was set by the discovery
> [2014-02-07 00:50:16,900][INFO ][discovery                ] [log-els01] 
> logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
> [2014-02-07 00:50:16,900][DEBUG][gateway                  ] [log-els01] 
> can't wait on start for (possibly) reading state from gateway, will do it 
> asynchronously
> [2014-02-07 00:50:16,907][INFO ][http                     ] [log-els01] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 172.22.123.224:9200]}
> [2014-02-07 00:50:16,908][INFO ][node                     ] [log-els01] 
> started
> [2014-02-07 00:50:16,922][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:19,929][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:22,931][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:25,962][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:28,963][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:31,965][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:35,003][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:38,025][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:41,048][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:44,051][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:47,053][DEBUG][discovery.zen            ] [log-els01] 
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
>
> Am I missing something obvious?  Can I do anything to get more debug 
> information?
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e87139d0-8403-4bc5-b7a6-33640a7ad38e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch with bootstrap.mlockall causes server crash

Reply via email to