Apologies for this mess.. there is more thing, the document count toggles
between 3748436 & 3748478 values when the actual count should be 3750000
I was unclear as to why this is happening.
Thanks a lot for the help.
Regards
Sri
On Monday, June 30, 2014 9:15:16 AM UTC-4, sri wrote:
>
> MORE INFO:
>
> I grepped only the 'WARN' messages.
>
> MASTER Node(ES1) logs:
> [2014-06-30 09:02:36,942][WARN ][index.engine.internal ] [NES1]
> [logsjmeter14][2] failed engine [refresh failed]
> [2014-06-30 09:02:37,715][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter14][2] sending failed shard for [logsjmeter14][2],
> node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID
> [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec header mismatch: actual header=0 vs
> expected header=1071082519 (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")
>
> slice=29488:29662)))]]]
> [2014-06-30 09:02:37,717][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter14][2] received shard failed for [logsjmeter14][2],
> node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID
> [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec header mismatch: actual header=0 vs
> expected header=1071082519 (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")
>
> slice=29488:29662)))]]]
> [2014-06-30 09:03:14,809][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter87][4] received shard failed for [logsjmeter87][4],
> node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID
> [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata
> (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")
>
> slice=15224:15300)))]]]
> [2014-06-30 09:03:24,021][WARN ][index.engine.internal ] [NES1]
> [logsjmeter65][1] failed engine [refresh failed]
> [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter65][1] sending failed shard for [logsjmeter65][1],
> node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID
> [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs
> expected codec=Lucene41NormsMetadata (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")
>
> slice=15048:15209)))]]]
> [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter65][1] received shard failed for [logsjmeter65][1],
> node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID
> [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs
> expected codec=Lucene41NormsMetadata (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")
>
> slice=15048:15209)))]]]
> [2014-06-30 09:03:31,778][WARN ][index.engine.internal ] [NES1]
> [logsjmeter79][0] failed engine [refresh failed]
> [2014-06-30 09:03:32,084][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter79][0] sending failed shard for [logsjmeter79][0],
> node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID
> [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries
> (resource:
> BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]]
> [2014-06-30 09:03:32,086][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter79][0] received shard failed for [logsjmeter79][0],
> node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID
> [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries
> (resource:
> BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]]
> [2014-06-30 09:03:33,865][WARN ][monitor.jvm ] [NES1]
> [gc][young][228848][7461] duration [1.7s], collections [1]/[2s], total
> [1.7s]/[4.4m], memory [3gb]->[2.8gb]/[3.9gb], all_pools {[young]
> [168.5mb]->[30.6mb]/[266.2mb]}{[survivor]
> [27.8mb]->[29.2mb]/[33.2mb]}{[old] [2.8gb]->[2.8gb]/[3.6gb]}
> [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1]
> [logsjmeter39][1] received shard failed for [logsjmeter39][1],
> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID
> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos
> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")
>
> slice=17707:22401))]]]
>
> ES2 logs:
> [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2]
> [logsjmeter87][4] sending failed shard for [logsjmeter87][4],
> node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID
> [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata
> (resource:
> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")
>
> slice=15224:15300)))]]]
>
> ES3 logs:
> [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3]
> [logsjmeter39][1] sending failed shard for [logsjmeter39][1],
> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID
> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh
> failed][CorruptIndexException[codec mismatch: actual
> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos
> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs"))
>
> in
> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")
>
> slice=17707:22401))]]]
>
> Thanks and Regards
> Sri
>
> On Monday, June 30, 2014 9:07:37 AM UTC-4, sri wrote:
>>
>> Hi Simon,
>>
>> i am currently using elasticsearch 1.2.1, i am getting the error on all
>> my data nodes, below are the errors:
>>
>> [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1]
>> [logsjmeter39][1] received shard failed for [logsjmeter39][1],
>> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID
>> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh
>> failed][CorruptIndexException[codec mismatch: actual
>> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos
>> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs"))
>>
>> in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")
>>
>> slice=17707:22401))]]]
>>
>> [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2]
>> [logsjmeter87][4] sending failed shard for [logsjmeter87][4],
>> node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID
>> [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh
>> failed][CorruptIndexException[codec mismatch: actual
>> codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata
>> (resource:
>> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm
>>
>> in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs"))
>>
>> in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")
>>
>> slice=15224:15300)))]]]
>>
>> [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3]
>> [logsjmeter39][1] sending failed shard for [logsjmeter39][1],
>> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID
>> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh
>> failed][CorruptIndexException[codec mismatch: actual
>> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos
>> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs"))
>>
>> in
>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")
>>
>> slice=17707:22401))]]]
>>
>> Thanks and Regards
>> Sri
>>
>>
>> On Monday, June 30, 2014 4:00:23 AM UTC-4, simonw wrote:
>>>
>>> hey,
>>>
>>> thanks for raising this, can you gimme more infos ie. which version you
>>> are using and if that happens only on one shard or on all shards in your
>>> system? It could just be what it says, and index corruption maybe due to HW
>>> failure but there could be other reasons....
>>>
>>> simon
>>>
>>> On Friday, June 27, 2014 5:20:26 PM UTC+2, sri wrote:
>>>>
>>>> Hi
>>>>
>>>> I am getting the below error my ES cluster quite frequently but am not
>>>> able to understand the actual reason as to why its happening.
>>>>
>>>> [2014-06-27 11:12:50,014][WARN ][cluster.action.shard ] [NES1]
>>>> [logsjmeter62][0] received shard failed for [logsjmeter62][0],
>>>> node[ZqO9OQ8VQ0uGkvXdIeovRg], [P], s[STARTED], indexUUID
>>>> [EfBgCRm8SWu4AtsNPYVXyA], reason [engine failure, message [refresh
>>>> failed][CorruptIndexException[codec mismatch: actual
>>>> codec=Lucene41PostingsWriterDoc vs expected codec=Lucene46FieldInfos
>>>> (resource:
>>>> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_39.fnm in
>>>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs"))
>>>>
>>>> in
>>>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs")
>>>>
>>>> zlice=7371:8755)))]]]
>>>>
>>>>
>>>> Thanks and Regards
>>>> Sri
>>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/142e0fbe-4f9c-4298-802a-fc9f22e63652%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.