Re: Elasticsearch with bootstrap.mlockall causes server crash

Mark Walkom Fri, 07 Feb 2014 13:22:56 -0800

What about your system logs? Things under /var/log like system/dmesg/daemon
etc.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com


On 7 February 2014 23:47, KingBob <[email protected]> wrote:

> Hello,
>
> I hope someone can assist with this -- whenever I start Elasticsearch with
> the bootstrap.mlockall option enabled my entire server stops responding
> after a minute or two.
>
> There is nothing obvious in the log file, even with DEBUG enabled.  I
> can't find any errors in my system logs either, so I'm at a bit of a loss
> as to what the issue is.
>
> When I disable the bootstrap.mlockall option ES runs fine.
>
> JRE -- jre1.7.0_51
> ES -- elasticsearch-0.90.11
> OS -- SLES11SP3 (VMware)
> RAM -- 8GB
> PAGING -- 5GB
> ES_HEAP_SIZE -- 4GB (50% actual)
>
> I'm building the first of a 3-node cluster.
> I intended to get everything working here, then clone the VM to create the
> remaining 2 nodes.
>
> The "elastic" user has the following ulimits:
>
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 63931
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) 6965324
> open files                      (-n) 65535
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) unlimited
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 63931
> virtual memory          (kbytes, -v) 6960400
> file locks                      (-x) unlimited
>
> Here is a DEBUG log from startup to hang (at which point I have to
> hard-reset the server):
>
> [2014-02-07 00:49:44,463][INFO ][node                     ] [log-els01]
> version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
> [2014-02-07 00:49:44,463][INFO ][node                     ] [log-els01]
> initializing ...
> [2014-02-07 00:49:44,464][DEBUG][node                     ] [log-els01]
> using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data
> [[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work
> [/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
> [2014-02-07 00:49:44,472][INFO ][plugins                  ] [log-els01]
> loaded [], sites []
> [2014-02-07 00:49:44,483][DEBUG][common.compress.lzf      ] using
> [UnsafeChunkDecoder] decoder
> [2014-02-07 00:49:44,489][DEBUG][env                      ] [log-els01]
> using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]],
> local_node_id [0]
> [2014-02-07 00:49:45,587][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [generic], type [cached], keep_alive [30s]
> [2014-02-07 00:49:45,595][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [index], type [fixed], size [1], queue_size [200]
> [2014-02-07 00:49:45,597][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [get], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [search], type [fixed], size [3], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
> [2014-02-07 00:49:45,598][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [management], type [scaling], min [1], size [5],
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive
> [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive
> [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [refresh], type [scaling], min [1], size [1],
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [warmer], type [scaling], min [1], size [1],
> keep_alive [5m]
> [2014-02-07 00:49:45,599][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [snapshot], type [scaling], min [1], size [1],
> keep_alive [5m]
> [2014-02-07 00:49:45,600][DEBUG][threadpool               ] [log-els01]
> creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
> [2014-02-07 00:49:45,638][DEBUG][transport.netty          ] [log-els01]
> using worker_count[2], port[9300-9400], bind_host[null],
> publish_host[null], compress[false], connect_timeout[30s],
> connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
> [2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast] [log-els01]
> using group [224.2.2.4], with port [54328], ttl [3], and address [null]
> [2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01]
> using initial hosts [], with concurrent_connects [10]
> [2014-02-07 00:49:45,668][DEBUG][discovery.zen            ] [log-els01]
> using ping.timeout [3s], master_election.filter_client [true],
> master_election.filter_data [false]
> [2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect      ] [log-els01]
> using minimum_master_nodes [2]
> [2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd         ] [log-els01]
> [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
> [2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd         ] [log-els01]
> [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
> [2014-02-07 00:49:45,704][DEBUG][monitor.jvm              ] [log-els01]
> enabled [true], last_gc_enabled [false], interval [1s], gc_threshold
> [{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000,
> debugThreshold=2000}, default=GcThreshold{name='default',
> warnThreshold=10000, infoThreshold=5000, debugThreshold=2000},
> young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700,
> debugThreshold=400}}]
> [2014-02-07 00:49:46,212][DEBUG][monitor.os               ] [log-els01]
> Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with
> refresh_interval [1s]
> [2014-02-07 00:49:46,217][DEBUG][monitor.process          ] [log-els01]
> Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be]
> with refresh_interval [1s]
> [2014-02-07 00:49:46,223][DEBUG][monitor.jvm              ] [log-els01]
> Using refresh_interval [1s]
> [2014-02-07 00:49:46,228][DEBUG][monitor.network          ] [log-els01]
> Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc]
> with refresh_interval [5s]
> [2014-02-07 00:49:46,235][DEBUG][monitor.network          ] [log-els01]
> net_infofailed to get Network Interface Info [log-els01: log-els01: Name or
> service not known]
> [2014-02-07 00:49:46,238][DEBUG][monitor.fs               ] [log-els01]
> Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with
> refresh_interval [1s]
> [2014-02-07 00:49:46,570][DEBUG][indices.store            ] [log-els01]
> using indices.store.throttle.type [MERGE], with
> index.store.throttle.max_bytes_per_sec [20mb]
> [2014-02-07 00:49:46,578][DEBUG][cache.memory             ] [log-els01]
> using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
> [1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
> [2014-02-07 00:49:46,583][DEBUG][script                   ] [log-els01]
> using script cache with max_size [500], expire [null]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using node_concurrent_recoveries [2],
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with
> [indices_all_active]
> [2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,591][DEBUG][gateway.local            ] [log-els01]
> using initial_shards [quorum], list_timeout [30s]
> [2014-02-07 00:49:46,735][DEBUG][indices.recovery         ] [log-els01]
> using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size
> [512kb], translog_size [512kb], translog_ops [1000], and compress [true]
> [2014-02-07 00:49:46,801][DEBUG][http.netty               ] [log-els01]
> using max_chunk_size[8kb], max_header_size[8kb],
> max_initial_line_length[4kb], max_content_length[100mb],
> receive_predictor[512kb->512kb]
> [2014-02-07 00:49:46,805][DEBUG][indices.memory           ] [log-els01]
> using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb],
> max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
> [2014-02-07 00:49:46,806][DEBUG][indices.cache.filter     ] [log-els01]
> using [node] weighted filter cache with size [20%], actual_size [817.5mb],
> expire [null], clean_interval [1m]
> [2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache  ] [log-els01]
> using size [-1] [-1b], expire [null]
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
> using gateway.local.auto_import_dangled [YES], with
> gateway.local.dangling_timeout [2h]
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
> took 0s to load state
> [2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01]
> took 0s to load started shards state
> [2014-02-07 00:49:46,824][DEBUG][bulk.udp                 ] [log-els01]
> using enabled [false], host [null], port [9700-9800], bulk_actions [1000],
> bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using node_concurrent_recoveries [2],
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with
> [indices_all_active]
> [2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using node_concurrent_recoveries [2],
> node_initial_primaries_recoveries [4]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster.routing.allocation.allow_rebalance] with
> [indices_all_active]
> [2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
> [log-els01] using [cluster_concurrent_rebalance] with [2]
> [2014-02-07 00:49:46,836][INFO ][node                     ] [log-els01]
> initialized
> [2014-02-07 00:49:46,836][INFO ][node                     ] [log-els01]
> starting ...
> [2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
> Using select timeout of 500
> [2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
> Epoll-bug workaround enabled = false
> [2014-02-07 00:49:46,890][DEBUG][transport.netty          ] [log-els01]
> Bound to address [/0:0:0:0:0:0:0:0:9300]
> [2014-02-07 00:49:46,893][INFO ][transport                ] [log-els01]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 172.22.123.224:9300]}
> [2014-02-07 00:49:49,912][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:52,913][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:55,914][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:49:58,915][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:01,916][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:04,918][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:07,919][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:10,920][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:13,921][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:16,900][WARN ][discovery                ] [log-els01]
> waited for 30s and no initial state was set by the discovery
> [2014-02-07 00:50:16,900][INFO ][discovery                ] [log-els01]
> logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
> [2014-02-07 00:50:16,900][DEBUG][gateway                  ] [log-els01]
> can't wait on start for (possibly) reading state from gateway, will do it
> asynchronously
> [2014-02-07 00:50:16,907][INFO ][http                     ] [log-els01]
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 172.22.123.224:9200]}
> [2014-02-07 00:50:16,908][INFO ][node                     ] [log-els01]
> started
> [2014-02-07 00:50:16,922][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:19,929][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:22,931][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:25,962][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:28,963][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:31,965][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:35,003][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:38,025][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:41,048][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:44,051][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-02-07 00:50:47,053][DEBUG][discovery.zen            ] [log-els01]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
>
> Am I missing something obvious?  Can I do anything to get more debug
> information?
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/68ed18b2-805f-43ce-bb0b-c4259c7004ef%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZJbf5oVTVTcUk5gDA3WsyjPLcw0UEsMo_jZB%3D83shsPw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch with bootstrap.mlockall causes server crash

Reply via email to