It looks like you lost connectivity between nodes, this may be due to GC. Shutdown all your ndoes and then add this into your config - discovery.zen.minimum_master_nodes: 2. Then restart your cluster one node at a time.
Are you using anything like ElasticHQ, kopf or marvel to monitor things? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: [email protected] web: www.campaignmonitor.com On 20 April 2014 17:37, hongsgo <[email protected]> wrote: > my cluster is consist of 3 instance ip name 15~17 > today in the morning. 17 instance was left the cluster > in the 15 instance elasticsearch-head plugin 17 instance stats is > "Unassigned" 16 is can not find. > what's happend? > please somebody help me > > 1. 17 instance log message.. in below.. > > [2014-04-20 03:29:28,539][INFO ][discovery.zen ] [10.32.240.17] > master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net > [/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each > with > maximum [30s] timeout] > [2014-04-20 03:29:28,540][INFO ][cluster.service ] [10.32.240.17] > master {new > [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], > previous > [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, > removed > {[10.32.240.16][Y > L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: > zen-disco-master_failed > ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]) > [2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats] > [10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw] > org.elasticsearch.index.engine.EngineClosedException: > [jp_listened_calcu_log][0] CurrentState[CLOSED] > > 2. 15. instance log message > [2014-04-20 03:27:18,747][INFO ][discovery.zen ] [10.32.240.15] > master_left > [[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]], > reason > [failed to ping, tried [3] times, each with maximum [30s] timeout] > [2014-04-20 03:27:18,757][INFO ][cluster.service ] [10.32.240.15] > master {new > [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], > previous > [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, > removed > {[10.32.240.16][Y > L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: > zen-disco-master_failed > ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]) > [2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15] > Received response for a request that has timed out, sent [68787ms] ago, > timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i > net[/10.32.240.17:21001]]], id [10310608] > [2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15] > Received response for a request that has timed out, sent [38787ms] ago, > timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in > et[/10.32.240.17:21001]]], id [10310609] > [2014-04-20 03:28:28,552][INFO ][discovery.zen ] [10.32.240.15] > master_left > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]], > reason > [no longer master] > [2014-04-20 03:28:28,557][INFO ][cluster.service ] [10.32.240.15] > master {new > [10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]], > previous > [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]}, > removed > {[10.32.240.17][a > 0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason: > zen-disco-master_failed > ([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]) > [2014-04-20 03:29:28,546][WARN ][discovery.zen ] [10.32.240.15] > received cluster state from > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which > is > also master but with an older cluster_state, telling > [[10.32.240.17][a0qNnjL > vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster > [2014-04-20 03:29:28,548][WARN ][discovery.zen ] [10.32.240.15] > failed to send rejoin request to > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] > org.elasticsearch.transport.SendRequestTransportException: > [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin] > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) > at > > org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541) > at > > org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298) > at > > org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.elasticsearch.transport.NodeNotConnectedException: > [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected > at > > org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834) > at > > org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532) > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) > ... 7 more > [2014-04-20 03:29:28,603][WARN ][discovery.zen ] [10.32.240.15] > received cluster state from > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which > is > also master but with an older cluster_state, telling > [[10.32.240.17][a0qNnjL > vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster > [2014-04-20 03:29:28,604][WARN ][discovery.zen ] [10.32.240.15] > failed to send rejoin request to > [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] > org.elasticsearch.transport.SendRequestTransportException: > [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin] > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) > at > > org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541) > at > > org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298) > at > > org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.elasticsearch.transport.NodeNotConnectedException: > [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected > at > > org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834) > at > > org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532) > at > > org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189) > ... 7 more > ~ > > 3. 17 instance elasticsearch process is alive > > /usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch > -Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp > > :/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/*:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/* > org.elasticsearch.bootstrap.ElasticSearch > > 4. configuration > cluster.name: music-es-beta > node.name: 10.32.240.15 > http.port: 21200 > transport.tcp.port: 21001 > multicast.enabled: false > index.number_of_shards: 3 > index.number_of_replicas: 1 > index.mapper.dynamic: false > action.auto_create_index: false > bootstrap.mlockall: true > discovery.zen.ping.timeout: 10s > index.cache.field.type: soft > discovery.zen.ping.unicast.hosts: ["10.32.240.15", > "10.32.240.16","10.32.240.17"] > > > > > -- > View this message in context: > http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-es-cluster-plz-help-me-tp4054448.html > Sent from the ElasticSearch Users mailing list archive at Nabble.com. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1397979426164-4054448.post%40n3.nabble.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXCnTTp_us3NPeeixWg2Un95%3DZCyQ%2BJ1oUziYLuiqvbA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
