Hello,
I'm testing Graylog solution.
This morning, when I came back to work, Graylog fails to start, 
Elasticsearch seems to be out.

Here's the ES start log :
[2016-02-10 08:34:45,931][INFO ][node                     ] [Moon-Boy] 
version[1.7.3], pid[4430], build[05d4530/2015-10-15T09:14:17Z]
[2016-02-10 08:34:45,932][INFO ][node                     ] [Moon-Boy] 
initializing ...
[2016-02-10 08:34:46,034][INFO ][plugins                  ] [Moon-Boy] 
loaded [], sites []
[2016-02-10 08:34:46,073][INFO ][env                      ] [Moon-Boy] 
using [1] data paths, mounts [[/ (rootfs)]], net usable_space [841.3gb], 
net total_space [901gb], types [rootfs]
[2016-02-10 08:34:48,498][INFO ][node                     ] [Moon-Boy] 
initialized
[2016-02-10 08:34:48,499][INFO ][node                     ] [Moon-Boy] 
starting ...
[2016-02-10 08:34:48,558][INFO ][transport                ] [Moon-Boy] 
bound_address {inet[/0.0.0.0:9300]}, publish_address 
{inet[/192.168.xxx.xxx:9300]}
[2016-02-10 08:34:48,566][INFO ][discovery                ] [Moon-Boy] 
graylog2/0DkFmDQqRxaD0LC5du8wrg
[2016-02-10 08:34:51,594][INFO ][cluster.service          ] [Moon-Boy] 
new_master 
[Moon-Boy][0DkFmDQqRxaD0LC5du8wrg][graylog][inet[/192.168.xxx.xxx:9300]]{master=true},
 
reason: zen-disco-join (elected_as_master)
[2016-02-10 08:34:51,616][INFO ][http                     ] [Moon-Boy] 
bound_address {inet[/0.0.0.0:9200]}, publish_address 
{inet[/192.168.xxx.xxx:9200]}
[2016-02-10 08:34:51,616][INFO ][node                     ] [Moon-Boy] 
started
[2016-02-10 08:34:51,666][INFO ][gateway                  ] [Moon-Boy] 
recovered [1] indices into cluster_state
[2016-02-10 08:34:52,026][WARN ][indices.cluster          ] [Moon-Boy] 
[[graylog2_0][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
[graylog2_0][0] failed to fetch index version after copying it over
        at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:161)
        at 
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: [graylog2_0][0] 
Preexisting corrupted index [corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: 
IOException[failed engine (reason: [corrupt file detected source: [merge]])]
        CorruptIndexException[checksum failed (hardware problem?) : 
expected=8f75bb00 actual=eb43241c 
(resource=BufferedChecksumIndexInput(_b8mh.fdt))]
java.io.IOException: failed engine (reason: [corrupt file detected source: 
[merge]])
        at org.elasticsearch.index.engine.Engine.failEngine(Engine.java:492)
        at 
org.elasticsearch.index.engine.InternalEngine$FailEngineOnMergeFailure.onFailedMerge(InternalEngine.java:1245)
        at 
org.elasticsearch.index.merge.scheduler.MergeSchedulerProvider$1.run(MergeSchedulerProvider.java:103)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
(hardware problem?) : expected=8f75bb00 actual=eb43241c 
(resource=BufferedChecksumIndexInput(_b8mh.fdt))
        at 
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.checkIntegrity(CompressingStoredFieldsReader.java:521)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:394)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
        at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
        at 
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)

        at 
org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:602)
        at 
org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:583)
        at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:124)
        ... 4 more
[2016-02-10 08:34:52,029][WARN ][cluster.action.shard     ] [Moon-Boy] 
[graylog2_0][0] received shard failed for [graylog2_0][0], 
node[0DkFmDQqRxaD0LC5du8wrg], [P], s[INITIALIZING], 
unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-02-10T07:34:51.614Z]], 
indexUUID [soBIIUcFQ1mFvfYcwYmJVA], reason [shard failure [failed 
recovery][IndexShardGatewayRecoveryException[[graylog2_0][0] failed to 
fetch index version after copying it over]; nested: 
CorruptIndexException[[graylog2_0][0] Preexisting corrupted index 
[corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: IOException[failed engine 
(reason: [corrupt file detected source: [merge]])]
        CorruptIndexException[checksum failed (hardware problem?) : 
expected=8f75bb00 actual=eb43241c 
(resource=BufferedChecksumIndexInput(_b8mh.fdt))]
java.io.IOException: failed engine (reason: [corrupt file detected source: 
[merge]])
        at org.elasticsearch.index.engine.Engine.failEngine(Engine.java:492)
        at 
org.elasticsearch.index.engine.InternalEngine$FailEngineOnMergeFailure.onFailedMerge(InternalEngine.java:1245)
        at 
org.elasticsearch.index.merge.scheduler.MergeSchedulerProvider$1.run(MergeSchedulerProvider.java:103)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
(hardware problem?) : expected=8f75bb00 actual=eb43241c 
(resource=BufferedChecksumIndexInput(_b8mh.fdt))
        at 
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.checkIntegrity(CompressingStoredFieldsReader.java:521)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:394)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
        at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
        at 
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
]; ]]
[2016-02-10 08:34:53,376][INFO ][cluster.service          ] [Moon-Boy] 
added 
{[graylog2-server][3NbPDd7pRw25SWGyOEUkaA][graylog][inet[/192.168.xxx.xxx:9350]]{client=true,
 
data=false, master=false},}, reason: zen-disco-receive(join from 
node[[graylog2-server][3NbPDd7pRw25SWGyOEUkaA][graylog][inet[/192.168.xxx.xxx:9350]]{client=true,
 
data=false, master=false}])

With a special mention to :
[Moon-Boy] [graylog2_0][0] received shard failed for [graylog2_0][0], 
node[0DkFmDQqRxaD0LC5du8wrg], [P], s[INITIALIZING], 
unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-02-10T07:34:51.614Z]], 
indexUUID [soBIIUcFQ1mFvfYcwYmJVA], reason [shard failure [failed 
recovery][IndexShardGatewayRecoveryException[[graylog2_0][0] failed to 
fetch index version after copying it over]; nested: 
CorruptIndexException[[graylog2_0][0] Preexisting corrupted index 
[corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: IOException[failed engine 
(reason: [corrupt file detected source: [merge]])]

I don't have any idea what I'm supposed to do and Google dosen't helps me :(

I precise that I don't poweroff the server during the night to be able to 
collect logs during this time
I already restart the server but no effect-of course.

Any idea ?

Thanks a lot

Nicolas

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/beff14b9-d6e0-4424-adb4-18c58b5a3b18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to