Re: Out of memory, missing shards, looks like split-brain

Quan Tong Anh Wed, 04 Jun 2014 20:01:02 -0700

I would like to know:
        - What is the root cause?
        - How do I fix that?
        - If it's memory problem? Is there anything that I can do (except for 
upgrade)?


On Jun 5, 2014, at 9:54 AM, Mark Walkom <[email protected]> wrote:

> What do you want to know exactly?
> 
> Regards,
> Mark Walkom
> 
> Infrastructure Engineer
> Campaign Monitor
> email: [email protected]
> web: www.campaignmonitor.com
> 
> 
> On 5 June 2014 12:40, Quan Tong Anh <[email protected]> wrote:
> I'm running a 3-node cluster with 2 data nodes. My configuration:
> 
> es1, es2:
> 
> node:
>   name: elasticsearch-1
>   master: true
>   data: true
> 
> 
> discovery:
>   zen:    ping:
>       multicast:
>         enabled: false
>       unicast:
>         hosts: 
> ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","elasticsearch-2.domain.com:9300",]
> 
> 
> 
> gl2:
> 
> node:
>   name: graylog2
>   master: false
>   data: false
> 
> 
> 
> Shinken has sent me a notification that said there is only 2 nodes in cluster:
> 
> {
>   "cluster_name" : "domain.com",
>   "status" : "red",
>   "timed_out" : false,
>   "number_of_nodes" : 2,
>   "number_of_data_nodes" : 1,
>   "active_primary_shards" : 12,
>   "active_shards" : 12,
>   "relocating_shards" : 0,
>   "initializing_shards" : 0,
>   "unassigned_shards" : 12
> }
> 
> 
> Log on the ES-1:
> 
> [2014-06-04 15:51:09,281][WARN ][transport                ] [elasticsearch-1] 
> Received response for a request that has timed out, sent [61627ms] ago, timed 
> ou
> t [30338ms] ago, action [discovery/zen/fd/masterPing], node 
> [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}],
>  id [272380]
> [2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] 
> [elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out of 
> memory failure
> java.lang.OutOfMemoryError: Java heap space
> [2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] 
> [elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], 
> [P], s[STARTED]: Failed
>  to execute 
> [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@7631d2a2]
> org.elasticsearch.transport.RemoteTransportException: 
> [elasticsearch-2][inet[/107.170.x.y:9300]][indices/stats/s]
> Caused by: org.elasticsearch.index.IndexShardMissingException: 
> [graylog2-graylog2_5][2] missing
>         at 
> org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
>         at 
> org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
>         at 
> org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
>         at 
> org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
>         at 
> org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:384)
>         at 
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> [2014-06-04 15:56:29,504][WARN ][index.engine.robin       ] [elasticsearch-1] 
> [graylog2_recent][0] failed engine
> java.lang.OutOfMemoryError: Java heap space
> 
> 
> 
> Log on the ES-2:
> 
> [2014-06-04 15:51:02,276][WARN ][transport.netty          ] [elasticsearch-2] 
> exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899 => /1
> 07.170.x.y:9300]], closing connection
> java.lang.OutOfMemoryError: Java heap space
>         at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:190)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
>         at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> 
> [2014-06-04 15:51:27,143][WARN ][indices.cluster          ] [elasticsearch-2] 
> [graylog2-graylog2_5][2] master 
> [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}]
>  marked shard as started, but shard have not been created, mark shard as 
> failed
> 
> 
> Log on the GL2:
> 
> Jun  4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN : 
> org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached. Not 
> waiting 
> any longer for writer threads to complete.
> Jun  4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN : 
> org.elasticsearch.discovery.zen - [graylog2] master_left and no other node 
> elected to become master, current nodes: 
> {[graylog2][hHcLLZ2GTamMajmE-a5lXg][inet[/107.170.z.t:9300]]{client=true, 
> data=false, master=false},}
> Jun  4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR: 
> org.graylog2.periodical.DeflectorManagerThread - Tried to check for number of 
> messages in current deflector target but did not find index. Aborting.
> Jun  4 15:51:14 graylog2 graylog2-server: 
> org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
> [SERVICE_UNAVAILABLE/1/state not recovered / 
> initialized];[SERVICE_UNAVAILABLE/2/no master];
> Jun  4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR: 
> org.graylog2.periodical.DeflectorManagerThread - Couldn't delete outdated or 
> empty indices
> Jun  4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR: 
> org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of ES 
> node.
> Jun  4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> Jun  4 15:52:57 graylog2 graylog2-server:       at 
> java.lang.Thread.run(Thread.java:744)
> 
> Please let me know if you need further information.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/GhKnPvx1rHw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAEM624Y7mgLV-eDwSbyA4SFhYOmAHmyCgzoXrrtvpsGoOhRaeQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/81C7B576-CC9A-4F4B-9F70-FF7CD6F0E0CC%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Out of memory, missing shards, looks like split-brain

Reply via email to