Re: Nodes restarting automatically

Jorge Ferrando Thu, 29 May 2014 01:33:07 -0700

Thanks for the answer David

I added this setting to elasticsearch.yml some days ago to see if that
what's the problem:


discovery.zen.ping.timeout: 5s
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 3

If I'm not mistaken, with those settings the node should be marked as
unavailable after 3m and most of the times it happens quicker. Am I wrong?


On Thu, May 29, 2014 at 10:29 AM, David Pilato <[email protected]> wrote:

> GC took too much time so your node become unresponsive I think.
> If you set 30 Gb RAM, you should increase the time out ping setting before
> a node is marked as unresponsive.
>
> And if you are under memory pressure, you could try to check your requests
> and see if you can have some optimization or start new nodes...
>
> My 2 cents.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 29 mai 2014 à 09:56, Jorge Ferrando <[email protected]> a écrit :
>
> I've been analyzing the problem with Marvel and nagios and I managed to
> get 2 more details:
>
> - The node restarting/reinitializing it's always the same. Node 3
> - It always happens quickly after getting the cluster in green state.
> Between some seconds and 2-3 minutes
>
> I have debug mode on in logging.yml:
>
> logger:
>   # log action execution errors for easier debugging
>   action: DEBUG
>
> But i dont see anything in the log. For instance, this is the last time it
> happened at around 9:47 the cluster became green and 9:50 the node restarted
>
> [2014-05-29 09:30:57,235][INFO ][monitor.jvm              ] [elastic ASIC
> nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total
> [745ms]/[8.5s], memory [951.1mb]->[598.9mb]/[29.9gb], all_pools {[young]
> [421.5mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old]
> [463.1mb]->[524.1mb]/[29.3gb]}
> [2014-05-29 09:45:36,322][WARN ][monitor.jvm              ] [elastic ASIC
> nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total
> [29.5s]/[29.5s], memory [5.1gb]->[4.3gb]/[29.9gb], all_pools {[young]
> [29.4mb]->[34.9mb]/[532.5mb]}{[survivor] [59.9mb]->[0b]/[66.5mb]}{[old]
> [5gb]->[4.2gb]/[29.3gb]}
> [2014-05-29 09:50:41,040][INFO ][node                     ] [elastic ASIC
> nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
> [2014-05-29 09:50:41,041][INFO ][node                     ] [elastic ASIC
> nodo 3] initializing ...
> [2014-05-29 09:50:41,063][INFO ][plugins                  ] [elastic ASIC
> nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk,
> head]
> [2014-05-29 09:50:47,908][INFO ][node                     ] [elastic ASIC
> nodo 3] initialized
> [2014-05-29 09:50:47,909][INFO ][node                     ] [elastic ASIC
> nodo 3] starting ...
>
> ¿Is there any other way of debugging what's going on with that node?
>
>
>
>
> On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando <[email protected]>
> wrote:
>
>> I thought about that but It would be strange because they are 3 Virtual
>> Machines in the same VMWare cluster with other hundreds of services and
>> nobody reported any networking problem.
>>
>>
>> On Thu, May 22, 2014 at 3:16 PM, emeschitc <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I may be wrong but it seems to me you have a problem with your network.
>>> It may be a flaky connection, broken nic or something wrong with your
>>> configuration for discovery and/or data transport ?
>>>
>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException:
>>> [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>  at
>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>> at
>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>  at
>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>>
>>> Check the status of the network on this node.
>>>
>>>
>>>
>>> On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch
>>> Users] <[hidden email]
>>> <http://user/SendEmail.jtp?type=node&node=4056287&i=0>> wrote:
>>>
>>>> Hello
>>>>
>>>> We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
>>>> elasticsearch v1.1.1
>>>>
>>>> It's be running flawlessly but since the last weak some of the nodes
>>>> restarts randomly and cluster gets to red state, then yellow, then green
>>>> and it happens again in a loop (sometimes it even doesnt get green state)
>>>>
>>>> I've tried to look at the logs but i can't find and obvious reason of
>>>> what can be going on
>>>>
>>>> I've found entries like these, but I don't know if they are in some way
>>>> related to the crash:
>>>>
>>>> [2014-05-22 13:55:16,150][WARN ][index.codec              ] [elastic
>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
>>>> [date_end] returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic
>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
>>>> [date_end.raw] returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic
>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
>>>> [date_start] returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic
>>>> ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field:
>>>> [date_start.raw] returning default postings format
>>>>
>>>>
>>>> For instance right now it was in yellow state, really close to get to
>>>> the green state and suddenly node 3 autorestarted and now cluster is red
>>>> with 2000 shard initializing. The log in that node shows this:
>>>>
>>>> [2014-05-22 13:59:48,498][INFO ][monitor.jvm              ] [elastic
>>>> ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s],
>>>> total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools {[young]
>>>> [456mb]->[7.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old]
>>>> [6gb]->[6gb]/[19.3gb]}
>>>> [2014-05-22 14:03:44,825][INFO ][node                     ] [elastic
>>>> ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
>>>> [2014-05-22 14:03:44,826][INFO ][node                     ] [elastic
>>>> ASIC nodo 3] initializing ...
>>>> [2014-05-22 14:03:44,839][INFO ][plugins                  ] [elastic
>>>> ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic
>>>> ASIC nodo 3] initialized
>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic
>>>> ASIC nodo 3] starting ...
>>>>
>>>> The crash happened exactly at 14:02.
>>>>
>>>> Any Idea what can be going on or how can I trace what's happening?
>>>>
>>>> After rebooting there are also DEBUG errors like this:
>>>>
>>>> [2014-05-22 14:06:16,621][DEBUG][action.search.type       ] [elastic
>>>> ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P],
>>>> s[STARTED]: Failed to execute
>>>> [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard
>>>> [true]
>>>> org.elasticsearch.transport.SendRequestTransportException: [elastic
>>>> ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
>>>> at
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
>>>>  at
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
>>>> at
>>>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
>>>>  at
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
>>>> at
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
>>>>  at
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
>>>> at
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
>>>>  at
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
>>>> at
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
>>>>  at
>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>> at
>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
>>>>  at
>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
>>>> at
>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>>  at
>>>> org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
>>>> at
>>>> org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
>>>>  at
>>>> org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98)
>>>> at
>>>> org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
>>>>  at
>>>> org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
>>>> at
>>>> org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
>>>>  at
>>>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
>>>> at
>>>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
>>>>  at
>>>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
>>>> at
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>> at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>  at
>>>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
>>>> at
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>> at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>> at
>>>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
>>>>  at
>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
>>>> at
>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>> at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>> at
>>>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>> at
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>> at
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>> at
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>> at
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>>  at
>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>> at
>>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>>  at
>>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException:
>>>> [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>>  at
>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>>>  at
>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>> at
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>>>  ... 50 more
>>>>
>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [hidden email]
>>>> <http://user/SendEmail.jtp?type=node&node=4056276&i=0>.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>> ------------------------------
>>>>  If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276.html
>>>>  To unsubscribe from ElasticSearch Users, click here.
>>>> NAML
>>>> <http://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: Nodes restarting automatically
>>> <http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276p4056287.html>
>>> Sent from the ElasticSearch Users mailing list archive
>>> <http://elasticsearch-users.115913.n3.nabble.com/> at Nabble.com.
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5ArT-7tCh_f%2B9XAH5UfnsjWaBrMG0sacqUrL7T6JV9r7Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5ArT-7tCh_f%2B9XAH5UfnsjWaBrMG0sacqUrL7T6JV9r7Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr
> <https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5CqL5ss7MbtO0L481XXkycTdz2qFSH%3DnPvu7P_W_3CiKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Nodes restarting automatically

Reply via email to