Hi Nicolas, it looks like one of the underlying Lucene indices got corrupted. There can be multiple reasons for that, like a corrupt disk, a corrupt file system, or a software bug. You should thoroughly check at least for the first two of those potential reasons to prevent this from happening again.
You can try to repair the Lucene indices using Lucene's CheckIndex command, see http://www.jillesvangurp.com/2015/02/18/elasticsearch-failed-shard-recovery/ for details. If repairing those Lucene indices fails, you'll have to delete the affected Elasticsearch index to resume normal operation ( https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-delete-index.html ). Cheers, Jochen On Wednesday, 10 February 2016 08:50:36 UTC+1, [email protected] wrote: > > Hello, > I'm testing Graylog solution. > This morning, when I came back to work, Graylog fails to start, > Elasticsearch seems to be out. > > Here's the ES start log : > [2016-02-10 08:34:45,931][INFO ][node ] [Moon-Boy] > version[1.7.3], pid[4430], build[05d4530/2015-10-15T09:14:17Z] > [2016-02-10 08:34:45,932][INFO ][node ] [Moon-Boy] > initializing ... > [2016-02-10 08:34:46,034][INFO ][plugins ] [Moon-Boy] > loaded [], sites [] > [2016-02-10 08:34:46,073][INFO ][env ] [Moon-Boy] > using [1] data paths, mounts [[/ (rootfs)]], net usable_space [841.3gb], > net total_space [901gb], types [rootfs] > [2016-02-10 08:34:48,498][INFO ][node ] [Moon-Boy] > initialized > [2016-02-10 08:34:48,499][INFO ][node ] [Moon-Boy] > starting ... > [2016-02-10 08:34:48,558][INFO ][transport ] [Moon-Boy] > bound_address {inet[/0.0.0.0:9300]}, publish_address > {inet[/192.168.xxx.xxx:9300]} > [2016-02-10 08:34:48,566][INFO ][discovery ] [Moon-Boy] > graylog2/0DkFmDQqRxaD0LC5du8wrg > [2016-02-10 08:34:51,594][INFO ][cluster.service ] [Moon-Boy] > new_master > [Moon-Boy][0DkFmDQqRxaD0LC5du8wrg][graylog][inet[/192.168.xxx.xxx:9300]]{master=true}, > > reason: zen-disco-join (elected_as_master) > [2016-02-10 08:34:51,616][INFO ][http ] [Moon-Boy] > bound_address {inet[/0.0.0.0:9200]}, publish_address > {inet[/192.168.xxx.xxx:9200]} > [2016-02-10 08:34:51,616][INFO ][node ] [Moon-Boy] > started > [2016-02-10 08:34:51,666][INFO ][gateway ] [Moon-Boy] > recovered [1] indices into cluster_state > [2016-02-10 08:34:52,026][WARN ][indices.cluster ] [Moon-Boy] > [[graylog2_0][0]] marking and sending shard failed due to [failed recovery] > org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: > [graylog2_0][0] failed to fetch index version after copying it over > at > org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:161) > at > org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.index.CorruptIndexException: [graylog2_0][0] > Preexisting corrupted index [corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: > IOException[failed engine (reason: [corrupt file detected source: [merge]])] > CorruptIndexException[checksum failed (hardware problem?) : > expected=8f75bb00 actual=eb43241c > (resource=BufferedChecksumIndexInput(_b8mh.fdt))] > java.io.IOException: failed engine (reason: [corrupt file detected source: > [merge]]) > at > org.elasticsearch.index.engine.Engine.failEngine(Engine.java:492) > at > org.elasticsearch.index.engine.InternalEngine$FailEngineOnMergeFailure.onFailedMerge(InternalEngine.java:1245) > at > org.elasticsearch.index.merge.scheduler.MergeSchedulerProvider$1.run(MergeSchedulerProvider.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed > (hardware problem?) : expected=8f75bb00 actual=eb43241c > (resource=BufferedChecksumIndexInput(_b8mh.fdt)) > at > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.checkIntegrity(CompressingStoredFieldsReader.java:521) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:394) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409) > at > org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486) > > at > org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:602) > at > org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:583) > at > org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:124) > ... 4 more > [2016-02-10 08:34:52,029][WARN ][cluster.action.shard ] [Moon-Boy] > [graylog2_0][0] received shard failed for [graylog2_0][0], > node[0DkFmDQqRxaD0LC5du8wrg], [P], s[INITIALIZING], > unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-02-10T07:34:51.614Z]], > indexUUID [soBIIUcFQ1mFvfYcwYmJVA], reason [shard failure [failed > recovery][IndexShardGatewayRecoveryException[[graylog2_0][0] failed to > fetch index version after copying it over]; nested: > CorruptIndexException[[graylog2_0][0] Preexisting corrupted index > [corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: IOException[failed engine > (reason: [corrupt file detected source: [merge]])] > CorruptIndexException[checksum failed (hardware problem?) : > expected=8f75bb00 actual=eb43241c > (resource=BufferedChecksumIndexInput(_b8mh.fdt))] > java.io.IOException: failed engine (reason: [corrupt file detected source: > [merge]]) > at > org.elasticsearch.index.engine.Engine.failEngine(Engine.java:492) > at > org.elasticsearch.index.engine.InternalEngine$FailEngineOnMergeFailure.onFailedMerge(InternalEngine.java:1245) > at > org.elasticsearch.index.merge.scheduler.MergeSchedulerProvider$1.run(MergeSchedulerProvider.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed > (hardware problem?) : expected=8f75bb00 actual=eb43241c > (resource=BufferedChecksumIndexInput(_b8mh.fdt)) > at > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.checkIntegrity(CompressingStoredFieldsReader.java:521) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:394) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332) > at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409) > at > org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486) > ]; ]] > [2016-02-10 08:34:53,376][INFO ][cluster.service ] [Moon-Boy] > added > {[graylog2-server][3NbPDd7pRw25SWGyOEUkaA][graylog][inet[/192.168.xxx.xxx:9350]]{client=true, > > data=false, master=false},}, reason: zen-disco-receive(join from > node[[graylog2-server][3NbPDd7pRw25SWGyOEUkaA][graylog][inet[/192.168.xxx.xxx:9350]]{client=true, > > data=false, master=false}]) > > With a special mention to : > [Moon-Boy] [graylog2_0][0] received shard failed for [graylog2_0][0], > node[0DkFmDQqRxaD0LC5du8wrg], [P], s[INITIALIZING], > unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-02-10T07:34:51.614Z]], > indexUUID [soBIIUcFQ1mFvfYcwYmJVA], reason [shard failure [failed > recovery][IndexShardGatewayRecoveryException[[graylog2_0][0] failed to > fetch index version after copying it over]; nested: > CorruptIndexException[[graylog2_0][0] Preexisting corrupted index > [corrupted_AXCaFkUiQgKLQJ_-3Ikqaw] caused by: IOException[failed engine > (reason: [corrupt file detected source: [merge]])] > > I don't have any idea what I'm supposed to do and Google dosen't helps me > :( > > I precise that I don't poweroff the server during the night to be able to > collect logs during this time > I already restart the server but no effect-of course. > > Any idea ? > > Thanks a lot > > Nicolas > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/8804f6e8-8b39-4787-bc23-f51d6f5a8a19%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
