Re: what's happend? my es-es cluster? plz help me.

Mark Walkom Mon, 21 Apr 2014 15:37:25 -0700

It looks like you lost connectivity between nodes, this may be due to GC.
Shutdown all your ndoes and then add this into your config
- discovery.zen.minimum_master_nodes: 2. Then restart your cluster one node
at a time.


Are you using anything like ElasticHQ, kopf or marvel to monitor things?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com


On 20 April 2014 17:37, hongsgo <[email protected]> wrote:

> my cluster is consist of 3 instance  ip name 15~17
> today in the morning. 17 instance was left the cluster
> in the 15 instance elasticsearch-head plugin 17 instance stats is
> "Unassigned" 16 is can not find.
> what's happend?
> please somebody help me
>
> 1. 17 instance log message.. in below..
>
> [2014-04-20 03:29:28,539][INFO ][discovery.zen   ] [10.32.240.17]
> master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net
> [/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each
> with
> maximum [30s] timeout]
> [2014-04-20 03:29:28,540][INFO ][cluster.service          ] [10.32.240.17]
> master {new
> [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],
> previous
> [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]},
> removed
> {[10.32.240.16][Y
> L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
> zen-disco-master_failed
> ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
> [2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats]
> [10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw]
> org.elasticsearch.index.engine.EngineClosedException:
> [jp_listened_calcu_log][0] CurrentState[CLOSED]
>
> 2. 15. instance log message
> [2014-04-20 03:27:18,747][INFO ][discovery.zen            ] [10.32.240.15]
> master_left
> [[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]],
> reason
> [failed to ping, tried [3] times, each with  maximum [30s] timeout]
> [2014-04-20 03:27:18,757][INFO ][cluster.service          ] [10.32.240.15]
> master {new
> [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],
> previous
> [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]},
> removed
> {[10.32.240.16][Y
> L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
> zen-disco-master_failed
> ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
> [2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15]
> Received response for a request that has timed out, sent [68787ms] ago,
> timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i
> net[/10.32.240.17:21001]]], id [10310608]
> [2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15]
> Received response for a request that has timed out, sent [38787ms] ago,
> timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in
> et[/10.32.240.17:21001]]], id [10310609]
> [2014-04-20 03:28:28,552][INFO ][discovery.zen            ] [10.32.240.15]
> master_left
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]],
> reason
> [no longer master]
> [2014-04-20 03:28:28,557][INFO ][cluster.service          ] [10.32.240.15]
> master {new
> [10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]],
> previous
> [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]},
> removed
> {[10.32.240.17][a
> 0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason:
> zen-disco-master_failed
> ([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]])
> [2014-04-20 03:29:28,546][WARN ][discovery.zen            ] [10.32.240.15]
> received cluster state from
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which
> is
> also master but with an older cluster_state, telling
> [[10.32.240.17][a0qNnjL
> vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
> [2014-04-20 03:29:28,548][WARN ][discovery.zen            ] [10.32.240.15]
> failed to send rejoin request to
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
> org.elasticsearch.transport.SendRequestTransportException:
> [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
>         at
>
> org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
>         at
>
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
>         at
>
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.elasticsearch.transport.NodeNotConnectedException:
> [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
>         at
>
> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
>         at
>
> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>         ... 7 more
> [2014-04-20 03:29:28,603][WARN ][discovery.zen            ] [10.32.240.15]
> received cluster state from
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which
> is
> also master but with an older cluster_state, telling
> [[10.32.240.17][a0qNnjL
> vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
> [2014-04-20 03:29:28,604][WARN ][discovery.zen            ] [10.32.240.15]
> failed to send rejoin request to
> [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
> org.elasticsearch.transport.SendRequestTransportException:
> [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
>         at
>
> org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
>         at
>
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
>         at
>
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.elasticsearch.transport.NodeNotConnectedException:
> [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
>         at
>
> org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
>         at
>
> org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
>         at
>
> org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
>         ... 7 more
> ~
>
> 3. 17 instance elasticsearch process is alive
>
>  /usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
> -Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp
>
> :/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/*:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/*
> org.elasticsearch.bootstrap.ElasticSearch
>
> 4. configuration
> cluster.name: music-es-beta
> node.name: 10.32.240.15
> http.port: 21200
> transport.tcp.port: 21001
> multicast.enabled: false
> index.number_of_shards: 3
> index.number_of_replicas: 1
> index.mapper.dynamic: false
> action.auto_create_index: false
> bootstrap.mlockall: true
> discovery.zen.ping.timeout: 10s
> index.cache.field.type: soft
> discovery.zen.ping.unicast.hosts: ["10.32.240.15",
> "10.32.240.16","10.32.240.17"]
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-es-cluster-plz-help-me-tp4054448.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1397979426164-4054448.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bXCnTTp_us3NPeeixWg2Un95%3DZCyQ%2BJ1oUziYLuiqvbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: what's happend? my es-es cluster? plz help me.

Reply via email to