Currently the Elasticsearch /_cluster/health shows status of "red". The
error I get [1] shows some kind of corruption. I tried these two commands
to fix the unassigned shards.
curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{"commands":[{"allocate":{"index":"graylog2_6","shard":1,"node":"gl-es01-esgl2","allow_primary":true}}]}'
curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{"commands":[{"allocate":{"index":"graylog2_6","shard":2,"node":"gl-es01-esgl2","allow_primary":true}}]}'
Once those commands are executed the two shards flip between UNASSIGNED and
INITIALIZING constantly and the logs rapidly print errors regarding the
footer mismatch [1].
I only have 1 Elasticsearch node at this time, and the last snapshot was a
few weeks before this power outage. I'd probably loose less data by
removing the corrupt indexes, if that's possible and my only option.
Thanks,
- Trey
[1]:
[2016-01-13 17:52:11,637][WARN ][cluster.action.shard ] [gl-es01-esgl2]
[graylog2_6][2] received shard failed for [graylog2_6][2],
node[2c5VN2XqSNe8DG2Ix8BY9A], [P], s[INITIALIZING],
unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-01-13T23:52:09.754Z]],
indexUUID [xBeW-OuARQqCHqjkHSVbVw], reason [shard failure [failed
recovery][IndexShardGatewayRecoveryException[[graylog2_6][2] failed to
fetch index version after copying it over]; nested:
CorruptIndexException[[graylog2_6][2] Preexisting corrupted index
[corrupted_y0yPuZiBSxuZxY10TXMDCQ] caused by: IOException[failed engine
(reason: [corrupt file (source: [start])])]
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
MMapIndexInput(path="/var/lib/elasticsearch-data/esgl2/gl2/nodes/0/indices/graylog2_6/2/index/_ew6_Lucene41_0.tim"))]
java.io.IOException: failed engine (reason: [corrupt file (source:
[start])])
at org.elasticsearch.index.engine.Engine.failEngine(Engine.java:492)
at org.elasticsearch.index.engine.Engine.maybeFailEngine(Engine.java:514)
at
org.elasticsearch.index.engine.InternalEngine.maybeFailEngine(InternalEngine.java:928)
at
org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:195)
at
org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:146)
at
org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1355)
at
org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1350)
at
org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:870)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer
mismatch: actual footer=0 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="/var/lib/elasticsearch-data/esgl2/gl2/nodes/0/indices/graylog2_6/2/index/_ew6_Lucene41_0.tim"))
at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:235)
at org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:228)
at
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.<init>(BlockTreeTermsReader.java:137)
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:441)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:197)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:254)
at
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:120)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:108)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:239)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112)
at org.apache.lucene.search.SearcherManager.<init>(SearcherManager.java:89)
at
org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:186)
... 10 more
]; ]]
[2016-01-13 17:52:12,353][WARN ][cluster.action.shard ] [gl-es01-esgl2]
[graylog2_6][2] received shard failed for [graylog2_6][2],
node[2c5VN2XqSNe8DG2Ix8BY9A], [P], s[INITIALIZING],
unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-01-13T23:52:09.754Z]],
indexUUID [xBeW-OuARQqCHqjkHSVbVw], reason [master
[gl-es01-esgl2][2c5VN2XqSNe8DG2Ix8BY9A][gl-es01.brazos.tamu.edu][inet[/192.168.200.93:9300]]
marked shard as initializing, but shard is marked as failed, resend shard
failure]
On Friday, January 15, 2016 at 3:19:22 AM UTC-6, Jochen Schalanda wrote:
>
> Hi Trey,
>
> do you see any other error messages in the logs of your Elasticsearch
> node(s)? Graylog won't start writing into the Elasticsearch cluster again
> until the health of index "graylog2_6" (which is probably the current
> deflector target) is GREEN again.
>
>
> Cheers,
> Jochen
>
> On Wednesday, 13 January 2016 19:08:41 UTC+1, Trey Dockendorf wrote:
>>
>> Our data center recently had a catastrophic power loss. Once everything
>> was back up Graylog refuses to start [1] and the issue seems to be a
>> corrupt elasticsearch indexes [2]. I've attempted rerouting the indexes
>> but that has not worked. I fear my only option is to delete the corrupt
>> indexes, but I'm unsure what kind of impact that will have on Graylog.
>>
>> Any advice is greatly appreciated.
>>
>> Thanks,
>> - Trey
>>
>> [1]:
>> 2016-01-13T12:01:24.886-06:00 WARN [BlockingBatchedESOutput] Error while
>> waiting for healthy Elasticsearch cluster. Not flushing.
>> java.util.concurrent.TimeoutException: Elasticsearch cluster didn't get
>> healthy within timeout
>> at
>> org.graylog2.indexer.cluster.Cluster.waitForConnectedAndHealthy(Cluster.java:174)
>> at
>> org.graylog2.indexer.cluster.Cluster.waitForConnectedAndHealthy(Cluster.java:179)
>> at
>> org.graylog2.outputs.BlockingBatchedESOutput.flush(BlockingBatchedESOutput.java:112)
>> at
>> org.graylog2.outputs.BlockingBatchedESOutput.write(BlockingBatchedESOutput.java:105)
>> at
>> org.graylog2.buffers.processors.OutputBufferProcessor$1.run(OutputBufferProcessor.java:189)
>> at
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-01-13T12:01:51.526-06:00 INFO [IndexRetentionThread] Elasticsearch
>> cluster not available, skipping index retention checks.
>>
>> [2]:
>> # curl -XGET http://localhost:9200/_cat/shards
>> graylog2_6 0 p STARTED 2700869 398.6mb 192.168.200.93 gl-es01-esgl2
>> graylog2_6 3 p STARTED 2694087 397mb 192.168.200.93 gl-es01-esgl2
>> graylog2_6 1 p UNASSIGNED
>> graylog2_6 2 p UNASSIGNED
>> graylog2_4 0 p STARTED 5040142 796.3mb 192.168.200.93 gl-es01-esgl2
>> graylog2_4 3 p STARTED 5003263 789.1mb 192.168.200.93 gl-es01-esgl2
>> graylog2_4 1 p STARTED 5001091 788.6mb 192.168.200.93 gl-es01-esgl2
>> graylog2_4 2 p STARTED 4958409 783.6mb 192.168.200.93 gl-es01-esgl2
>> graylog2_5 0 p STARTED 5019303 743.4mb 192.168.200.93 gl-es01-esgl2
>> graylog2_5 3 p STARTED 5001404 739.3mb 192.168.200.93 gl-es01-esgl2
>> graylog2_5 1 p STARTED 5000525 739.3mb 192.168.200.93 gl-es01-esgl2
>> graylog2_5 2 p STARTED 4979658 737.7mb 192.168.200.93 gl-es01-esgl2
>> graylog2_2 0 p STARTED 5023249 985.4mb 192.168.200.93 gl-es01-esgl2
>> graylog2_2 3 p STARTED 4999096 979.1mb 192.168.200.93 gl-es01-esgl2
>> graylog2_2 1 p STARTED 5001476 980.1mb 192.168.200.93 gl-es01-esgl2
>> graylog2_2 2 p STARTED 4976833 973.5mb 192.168.200.93 gl-es01-esgl2
>> graylog2_3 0 p STARTED 5000546 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_3 3 p STARTED 4998766 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_3 1 p STARTED 4999378 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_3 2 p STARTED 5001326 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_0 0 p STARTED 3686796 819.8mb 192.168.200.93 gl-es01-esgl2
>> graylog2_0 3 p STARTED 3655173 812.8mb 192.168.200.93 gl-es01-esgl2
>> graylog2_0 1 p STARTED 3655623 813mb 192.168.200.93 gl-es01-esgl2
>> graylog2_0 2 p STARTED 3625428 805.7mb 192.168.200.93 gl-es01-esgl2
>> graylog2_1 0 p STARTED 5053805 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_1 3 p STARTED 5000588 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_1 1 p STARTED 5001749 1gb 192.168.200.93 gl-es01-esgl2
>> graylog2_1 2 p STARTED 4943861 1gb 192.168.200.93 gl-es01-esgl2
>>
>
--
You received this message because you are subscribed to the Google Groups
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/graylog2/8aea996d-56f1-4d66-9b30-319dffcbbd93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.