Re: Nodes restarting automatically

David Pilato Thu, 29 May 2014 01:40:17 -0700

It sounds like the old GC is not able to clean old gen space enough.
I guess that if you look at your Marvel dashboards, you can see that on old GC.


So memory pressure is the first guess. You may have too many old GC cycles.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 29 mai 2014 à 10:32, Jorge Ferrando <[email protected]> a écrit :

Thanks for the answer David

I added this setting to elasticsearch.yml some days ago to see if that what's 
the problem:

discovery.zen.ping.timeout: 5s
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 3

If I'm not mistaken, with those settings the node should be marked as 
unavailable after 3m and most of the times it happens quicker. Am I wrong?


> On Thu, May 29, 2014 at 10:29 AM, David Pilato <[email protected]> wrote:
> GC took too much time so your node become unresponsive I think.
> If you set 30 Gb RAM, you should increase the time out ping setting before a 
> node is marked as unresponsive.
> 
> And if you are under memory pressure, you could try to check your requests 
> and see if you can have some optimization or start new nodes...
> 
> My 2 cents.
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
> 
> Le 29 mai 2014 à 09:56, Jorge Ferrando <[email protected]> a écrit :
> 
> I've been analyzing the problem with Marvel and nagios and I managed to get 2 
> more details:
> 
> - The node restarting/reinitializing it's always the same. Node 3
> - It always happens quickly after getting the cluster in green state. Between 
> some seconds and 2-3 minutes
> 
> I have debug mode on in logging.yml:
> 
> logger:
>   # log action execution errors for easier debugging
>   action: DEBUG
> 
> But i dont see anything in the log. For instance, this is the last time it 
> happened at around 9:47 the cluster became green and 9:50 the node restarted
> 
> [2014-05-29 09:30:57,235][INFO ][monitor.jvm              ] [elastic ASIC 
> nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total 
> [745ms]/[8.5s], memory [951.1mb]->[598.9mb]/[29.9gb], all_pools {[young] 
> [421.5mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] 
> [463.1mb]->[524.1mb]/[29.3gb]}
> [2014-05-29 09:45:36,322][WARN ][monitor.jvm              ] [elastic ASIC 
> nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total 
> [29.5s]/[29.5s], memory [5.1gb]->[4.3gb]/[29.9gb], all_pools {[young] 
> [29.4mb]->[34.9mb]/[532.5mb]}{[survivor] [59.9mb]->[0b]/[66.5mb]}{[old] 
> [5gb]->[4.2gb]/[29.3gb]}
> [2014-05-29 09:50:41,040][INFO ][node                     ] [elastic ASIC 
> nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
> [2014-05-29 09:50:41,041][INFO ][node                     ] [elastic ASIC 
> nodo 3] initializing ...
> [2014-05-29 09:50:41,063][INFO ][plugins                  ] [elastic ASIC 
> nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, 
> head]
> [2014-05-29 09:50:47,908][INFO ][node                     ] [elastic ASIC 
> nodo 3] initialized
> [2014-05-29 09:50:47,909][INFO ][node                     ] [elastic ASIC 
> nodo 3] starting ...
> 
> ¿Is there any other way of debugging what's going on with that node? 
> 
> 
> 
> 
>> On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando <[email protected]> wrote:
>> I thought about that but It would be strange because they are 3 Virtual 
>> Machines in the same VMWare cluster with other hundreds of services and 
>> nobody reported any networking problem.
>> 
>> 
>>> On Thu, May 22, 2014 at 3:16 PM, emeschitc <[email protected]> wrote:
>>> Hi,
>>> 
>>> I may be wrong but it seems to me you have a problem with your network. It 
>>> may be a flaky connection, broken nic or something wrong with your 
>>> configuration for discovery and/or data transport ? 
>>> 
>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic 
>>> ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>     at 
>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>>     at 
>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>     at 
>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>> 
>>> Check the status of the network on this node.
>>> 
>>> 
>>> 
>>>> On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] 
>>>> <[hidden email]> wrote:
>>>> Hello 
>>>> 
>>>> We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and 
>>>> elasticsearch v1.1.1
>>>> 
>>>> It's be running flawlessly but since the last weak some of the nodes 
>>>> restarts randomly and cluster gets to red state, then yellow, then green 
>>>> and it happens again in a loop (sometimes it even doesnt get green state)
>>>> 
>>>> I've tried to look at the logs but i can't find and obvious reason of what 
>>>> can be going on 
>>>> 
>>>> I've found entries like these, but I don't know if they are in some way 
>>>> related to the crash:
>>>> 
>>>> [2014-05-22 13:55:16,150][WARN ][index.codec              ] [elastic ASIC 
>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] 
>>>> returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>> [date_end.raw] returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>> [date_start] returning default postings format
>>>> [2014-05-22 13:55:16,151][WARN ][index.codec              ] [elastic ASIC 
>>>> nodo 3] [logstash-2014.05.22] no index mapper found for field: 
>>>> [date_start.raw] returning default postings format
>>>> 
>>>> 
>>>> For instance right now it was in yellow state, really close to get to the 
>>>> green state and suddenly node 3 autorestarted and now cluster is red with 
>>>> 2000 shard initializing. The log in that node shows this:
>>>> 
>>>> [2014-05-22 13:59:48,498][INFO ][monitor.jvm              ] [elastic ASIC 
>>>> nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], 
>>>> total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools {[young] 
>>>> [456mb]->[7.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] 
>>>> [6gb]->[6gb]/[19.3gb]}
>>>> [2014-05-22 14:03:44,825][INFO ][node                     ] [elastic ASIC 
>>>> nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
>>>> [2014-05-22 14:03:44,826][INFO ][node                     ] [elastic ASIC 
>>>> nodo 3] initializing ...
>>>> [2014-05-22 14:03:44,839][INFO ][plugins                  ] [elastic ASIC 
>>>> nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic ASIC 
>>>> nodo 3] initialized
>>>> [2014-05-22 14:03:51,967][INFO ][node                     ] [elastic ASIC 
>>>> nodo 3] starting ...
>>>> 
>>>> The crash happened exactly at 14:02.
>>>> 
>>>> Any Idea what can be going on or how can I trace what's happening?
>>>> 
>>>> After rebooting there are also DEBUG errors like this:
>>>> 
>>>> [2014-05-22 14:06:16,621][DEBUG][action.search.type       ] [elastic ASIC 
>>>> nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], 
>>>> s[STARTED]: Failed to execute 
>>>> [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true]
>>>> org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC 
>>>> nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
>>>>    at 
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
>>>>    at 
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
>>>>    at 
>>>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
>>>>    at 
>>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
>>>>    at 
>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>>    at 
>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
>>>>    at 
>>>> org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
>>>>    at 
>>>> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
>>>>    at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
>>>>    at 
>>>> org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
>>>>    at 
>>>> org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98)
>>>>    at 
>>>> org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
>>>>    at 
>>>> org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
>>>>    at 
>>>> org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
>>>>    at 
>>>> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
>>>>    at 
>>>> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
>>>>    at 
>>>> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>    at 
>>>> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>>    at 
>>>> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
>>>>    at 
>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
>>>>    at 
>>>> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>    at 
>>>> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>>    at 
>>>> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>>    at 
>>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>>    at 
>>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>    at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic 
>>>> ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
>>>>    at 
>>>> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
>>>>    at 
>>>> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
>>>>    at 
>>>> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>>>>    ... 50 more
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [hidden email].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> 
>>>> If you reply to this email, your message will be added to the discussion 
>>>> below:
>>>> http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276.html
>>>> To unsubscribe from ElasticSearch Users, click here.
>>>> NAML
>>> 
>>> 
>>> View this message in context: Re: Nodes restarting automatically
>>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5ArT-7tCh_f%2B9XAH5UfnsjWaBrMG0sacqUrL7T6JV9r7Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5CqL5ss7MbtO0L481XXkycTdz2qFSH%3DnPvu7P_W_3CiKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F40FD3BA-135B-49B9-B2CF-0E68D58D9B5D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Nodes restarting automatically

Reply via email to