Re: corrupted shard after optimize

mjdude5 Tue, 24 Mar 2015 08:12:57 -0700

Quick followup question, is it safe to run -fix while ES is also running on 
the node?  Understanding that some documents will be lost.


On Tuesday, March 24, 2015 at 10:24:26 AM UTC-4, [email protected] wrote:
>
> Thanks for the CheckIndex info, that worked!  It looks like only one of 
> the segments in that shard has issues:
>
>   1 of 20: name=_1om docCount=216683
>     codec=Lucene3x
>     compound=false
>     numFiles=10
>     size (MB)=5,111.421
>     diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, 
> source=merge, lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, 
> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.6.0_26, 
> java.vendor=Sun Microsystems Inc.}
>     no deletions
>     test: open reader.........OK
>     test: check integrity.....OK
>     test: check live docs.....OK
>     test: fields..............OK [31 fields]
>     test: field norms.........OK [20 fields]
>     test: terms, freq, prox...ERROR: java.lang.AssertionError: 
> index=216690, numBits=216683
> java.lang.AssertionError: index=216690, numBits=216683
>         at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
>         at 
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
>         at 
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
>         at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
>     test: stored fields.......OK [3033562 total field count; avg 14 fields 
> per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq 
> vector fields per doc]
>     test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
> 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full 
> exception:
> java.lang.RuntimeException: Term Index test failed
>         at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
>
> This is on ES 1.3.4, but the index I was running optimize on was likely 
> created back in 0.9 or 1.0.
>
> On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>>
>> Hmm, not good.
>>
>> Which version of ES?  Do you have a full stack trace for the exception?
>>
>> To run CheckIndex you need to add all ES jars to the classpath.  It's 
>> easiest to just use a wildcard for this, e.g.:
>>
>>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
>> ...
>>
>> Make sure you have the double quotes so the shell does not expand that 
>> wildcard!
>>
>> Mike McCandless
>>
>> On Mon, Mar 23, 2015 at 9:50 PM, <[email protected]> wrote:
>>
>>> I did an optimize on this index and it looks like it caused a shard to 
>>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>>> to light?
>>>
>>> On the node that reported the corrupted shard I tried shutting it down, 
>>> moving the shard out and then restarting. Unfortunately the next node that 
>>> got that shard then started with the same corruption issues.  The errors:
>>>
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][indices.cluster          ] [Meteorite II] [1-2013][0] failed to start 
>>> shard
>>> Mar 24 01:40:17 localhost 
>>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>>> [1-2013][0] failed to fetch index version after copying it over
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][cluster.action.shard     ] [Meteorite II] [1-2013][0] sending failed 
>>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>>> indexUUID [_na_], reason [Failed to start shard, message 
>>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>>
>>> I tried using CheckIndex, but had this issue:
>>>
>>> java.lang.IllegalArgumentException: A SPI class of type 
>>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>>> You need to add the corresponding JAR file supporting this SPI to your 
>>> classpath.The current classpath supports the following names: [Pulsing41, 
>>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>>> FST41, FSTOrd41, Lucene40, Lucene41]
>>>
>>> When running with:
>>>
>>> java -cp 
>>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>>  
>>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>>
>>> I'm not a java programmer so after I tried other classpath combinations 
>>> I was out of ideas.
>>>
>>>
>>> Any tips?  Looking at _cat/shards the replica is currently marked 
>>> "unassigned" while the primary is "initializing".  Thanks!
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/95617d13-13fa-4b36-86ec-cc60a37d54cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: corrupted shard after optimize

Reply via email to