red status after unexpected stop

Arnau Arnauito Wed, 07 May 2014 03:26:20 -0700

Hello all,

I did ask similar question to logstash mailing-list,  but as I've debugging 
the problem and it's 100% EL related I've decided to ask for help here.


I have a 1 node EL cluster filled from logstash. Everything was working 
fine but ast week the node suffered an unexpected powercut and now the 
cluster is in red status:

{
cluster_name: BigLog
status: red
timed_out: false
number_of_nodes: 1
number_of_data_nodes: 1
active_primary_shards: 1175
active_shards: 1175
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 1195
}


The first start after the crash gave me no error:

[2014-05-06 11:44:05,867][INFO ][node                     ] [BigLog1] 
version[0.90.10], pid[32758], build[0a5781f/2014-01-10T10:18:37Z]
[2014-05-06 11:44:05,868][INFO ][node                     ] [BigLog1] 
initializing ...
[2014-05-06 11:44:05,875][INFO ][plugins                  ] [BigLog1] 
loaded [], sites []
[2014-05-06 11:49:30,318][INFO ][node                     ] [BigLog1] 
version[0.90.10], pid[1212], build[0a5781f/2014-01-10T10:18:37Z]
[2014-05-06 11:49:30,318][INFO ][node                     ] [BigLog1] 
initializing ...
[2014-05-06 11:49:30,325][INFO ][plugins                  ] [BigLog1] 
loaded [], sites []
[2014-05-06 11:50:16,450][INFO ][node                     ] [BigLog1] 
initialized
[2014-05-06 11:50:16,450][INFO ][node                     ] [BigLog1] 
starting ...
[2014-05-06 11:50:16,611][INFO ][transport                ] [BigLog1] 
bound_address {inet[/172.17.20.4:9301]}, publish_address 
{inet[/172.17.20.4:9301]}
[2014-05-06 11:50:19,652][INFO ][cluster.service          ] [BigLog1] 
new_master [BigLog1][TVlZ5MjhRTm21c6iu57NbA][inet[/172.17.20.4:9301]], 
reason: zen-disco-join (elected_as
_master)
[2014-05-06 11:50:19,676][INFO ][discovery                ] [BigLog1] 
BigLog/TVlZ5MjhRTm21c6iu57NbA
[2014-05-06 11:50:20,095][INFO ][http                     ] [BigLog1] 
bound_address {inet[/172.17.20.4:9200]}, publish_address 
{inet[/172.17.20.4:9200]}
[2014-05-06 11:50:20,096][INFO ][node                     ] [BigLog1] 
started
[2014-05-06 11:50:23,024][INFO ][gateway                  ] [BigLog1] 
recovered [235] indices into cluster_state
[2014-05-06 11:50:23,024][INFO ][cluster.service          ] [BigLog1] added 
{[Stentor][fqar3OoSSSOPMgPHvNbV3w][inet[/172.17.20.4:9300]]{client=true, 
data=false},}, reason: zen
-disco-receive(join from 
node[[Stentor][fqar3OoSSSOPMgPHvNbV3w][inet[/172.17.20.4:9300]]{client=true, 
data=false}])
[2014-05-06 11:50:23,150][WARN ][transport.netty          ] [BigLog1] 
Message not fully read (request) for [30] and action 
[cluster/nodeIndexCreated], resetting
[...]
[2014-05-06 11:50:49,899][DEBUG][action.search.type       ] [BigLog1] All 
shards failed for phase: [query]
[2014-05-06 11:50:49,901][DEBUG][action.search.type       ] [BigLog1] All 
shards failed for phase: [query]
[2014-05-06 11:52:14,027][INFO ][cluster.service          ] [BigLog1] 
removed 
{[Stentor][fqar3OoSSSOPMgPHvNbV3w][inet[/172.17.20.4:9300]]{client=true, 
data=false},}, reason: 
zen-disco-node_failed([Stentor][fqar3OoSSSOPMgPHvNbV3w][inet[/172.17.20.4:9300]]{client=true,
 
data=false}), reason transport disconnected (with verified connect)


After that start, a new restart said absolutly nothing strange:

[2014-05-06 14:28:19,151][INFO ][node                     ] [BigLog1] 
version[0.90.10], pid[9441], build[0a5781f/2014-01-10T10:18:37Z]
[2014-05-06 14:28:19,152][INFO ][node                     ] [BigLog1] 
initializing ...
[2014-05-06 14:28:19,158][INFO ][plugins                  ] [BigLog1] 
loaded [], sites []
[2014-05-06 14:28:36,283][INFO ][node                     ] [BigLog1] 
initialized
[2014-05-06 14:28:36,283][INFO ][node                     ] [BigLog1] 
starting ...
[2014-05-06 14:28:36,440][INFO ][transport                ] [BigLog1] 
bound_address {inet[/172.17.20.4:9300]}, publish_address 
{inet[/172.17.20.4:9300]}
[2014-05-06 14:28:39,479][INFO ][cluster.service          ] [BigLog1] 
new_master [BigLog1][XXeD06hgQZKks-ChqVh-5w][inet[/172.17.20.4:9300]], 
reason: zen-disco-join (elected_as_master)
[2014-05-06 14:28:39,517][INFO ][discovery                ] [BigLog1] 
BigLog/XXeD06hgQZKks-ChqVh-5w
[2014-05-06 14:28:39,760][INFO ][http                     ] [BigLog1] 
bound_address {inet[/172.17.20.4:9200]}, publish_address 
{inet[/172.17.20.4:9200]}
[2014-05-06 14:28:39,760][INFO ][node                     ] [BigLog1] 
started
[2014-05-06 14:28:41,788][INFO ][gateway                  ] [BigLog1] 
recovered [237] indices into cluster_state

But the status is red :-(

So, I've enabled debug and I've tried to understand what happening with 
shards, so I've looked for one into log:
[2014-05-07 11:38:52,188][DEBUG][gateway.local            ] [BigLog1] 
[logstash-2014.02.17][0]: throttling allocation [[logstash-2014.02.17][0], 
node[null], [P], s[UNASSIGNED]] to 
[[[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]]]] on primary 
allocation
[...]
[2014-05-07 11:39:15,362][DEBUG][gateway.local            ] [BigLog1] 
[logstash-2014.02.17][0]: throttling allocation [[logstash-2014.02.17][0], 
node[null], [P], s[UNASSIGNED]] to 
[[[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]]]] on primary 
allocation
[2014-05-07 11:39:15,551][DEBUG][gateway.local            ] [BigLog1] 
[logstash-2014.02.17][0]: allocating [[logstash-2014.02.17][0], node[null], 
[P], s[UNASSIGNED]] to 
[[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]]] on primary 
allocation
[2014-05-07 11:39:15,638][DEBUG][indices.cluster          ] [BigLog1] 
[logstash-2014.02.17][0] creating shard
[2014-05-07 11:39:15,725][DEBUG][index.deletionpolicy     ] [BigLog1] 
[logstash-2014.02.17][0] Using [keep_only_last] deletion policy
[2014-05-07 11:39:15,726][DEBUG][index.merge.policy       ] [BigLog1] 
[logstash-2014.02.17][0] using [tiered] merge policy with 
expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10], 
max_merge_at_once_explicit[30], max_merged_segment[5gb], 
segments_per_tier[10.0], reclaim_deletes_weight[2.0], async_merge[true]
[2014-05-07 11:39:15,726][DEBUG][index.merge.scheduler    ] [BigLog1] 
[logstash-2014.02.17][0] using [concurrent] merge scheduler with 
max_thread_count[3]
[2014-05-07 11:39:15,726][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] state: [CREATED]
[2014-05-07 11:39:15,727][DEBUG][index.translog           ] [BigLog1] 
[logstash-2014.02.17][0] interval [5s], flush_threshold_ops [5000], 
flush_threshold_size [200mb], flush_threshold_period [30m]
[2014-05-07 11:39:15,727][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] state: [CREATED]->[RECOVERING], reason [from 
gateway]
[2014-05-07 11:39:15,727][DEBUG][index.gateway            ] [BigLog1] 
[logstash-2014.02.17][0] starting recovery from local ...
[2014-05-07 11:39:15,732][DEBUG][index.engine.robin       ] [BigLog1] 
[logstash-2014.02.17][0] starting engine
[2014-05-07 11:39:15,768][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] state: [RECOVERING]->[POST_RECOVERY], reason [post 
recovery]
[2014-05-07 11:39:15,768][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] scheduling refresher every 5s
[2014-05-07 11:39:15,769][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] scheduling optimizer / merger every 1s
[2014-05-07 11:39:15,769][DEBUG][index.gateway            ] [BigLog1] 
[logstash-2014.02.17][0] recovery completed from [local], took [42ms]
[2014-05-07 11:39:15,769][DEBUG][cluster.action.shard     ] [BigLog1] 
sending shard started for [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [after recovery from gateway]
[2014-05-07 11:39:15,769][DEBUG][cluster.action.shard     ] [BigLog1] 
received shard started for [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [after recovery from gateway]
[2014-05-07 11:39:15,953][DEBUG][cluster.action.shard     ] [BigLog1] 
sending shard started for [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]
[2014-05-07 11:39:15,953][DEBUG][cluster.action.shard     ] [BigLog1] 
received shard started for [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]
[2014-05-07 11:39:15,966][DEBUG][cluster.action.shard     ] [BigLog1] 
[logstash-2014.02.17][0] will apply shard started [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [after recovery from gateway]
[2014-05-07 11:39:15,966][DEBUG][cluster.action.shard     ] [BigLog1] 
[logstash-2014.02.17][0] will apply shard started [logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING], indexUUID 
[pHG4k-OJQFeCiCfrhf59Iw], reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]
[2014-05-07 11:39:16,210][DEBUG][index.shard.service      ] [BigLog1] 
[logstash-2014.02.17][0] state: [POST_RECOVERY]->[STARTED], reason [global 
state is [STARTED]]
[2014-05-07 11:39:18,477][DEBUG][index.engine.robin       ] [BigLog1] 
[logstash-2014.02.17][0] updating index_buffer_size from [64mb] to [4mb]
[2014-05-07 11:40:17,497][DEBUG][cluster.service          ] [BigLog1] 
processing [shard-started ([logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]: execute
[2014-05-07 11:40:17,526][DEBUG][cluster.service          ] [BigLog1] 
cluster state updated, version [329], source [shard-started 
([logstash-2014.02.17][0], node[r4MP-9AYTrKkzva1IoXzDw], [P], 
s[INITIALIZING]), reason [after recovery from gateway]]
[2014-05-07 11:40:17,740][DEBUG][cluster.service          ] [BigLog1] 
processing [shard-started ([logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING]), reason [after recovery 
from gateway]]: done applying updated cluster_state (version: 329)
[2014-05-07 11:40:18,302][DEBUG][cluster.service          ] [BigLog1] 
processing [shard-started ([logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING]), reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]]: 
execute
[2014-05-07 11:40:18,329][DEBUG][cluster.service          ] [BigLog1] 
cluster state updated, version [332], source [shard-started 
([logstash-2014.02.17][0], node[r4MP-9AYTrKkzva1IoXzDw], [P], 
s[INITIALIZING]), reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]]
[2014-05-07 11:40:18,504][DEBUG][cluster.service          ] [BigLog1] 
processing [shard-started ([logstash-2014.02.17][0], 
node[r4MP-9AYTrKkzva1IoXzDw], [P], s[INITIALIZING]), reason [master 
[BigLog1][r4MP-9AYTrKkzva1IoXzDw][inet[/172.17.20.4:9300]] marked shard as 
initializing, but shard state is [POST_RECOVERY], mark shard as started]]: 
done applying updated cluster_state (version: 332)
[2014-05-07 12:09:48,982][DEBUG][indices.memory           ] [BigLog1] 
marking shard [logstash-2014.02.17][0] as inactive (inactive_time[30m]) 
indexing wise, setting size to [500kb]
[2014-05-07 12:10:13,726][DEBUG][index.engine.robin       ] [BigLog1] 
[logstash-2014.02.17][0] updating index_buffer_size from [4mb] to 
(inactive) [500kb]


Could someone help me to understand what happening with this (an other) 
shard?

TIA,
Arnau

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cae1a071-801f-4f0b-aecd-7e2780180a5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

red status after unexpected stop

Reply via email to