Quick followup question, is it safe to run -fix while ES is also running on the node? Understanding that some documents will be lost.
On Tuesday, March 24, 2015 at 10:24:26 AM UTC-4, [email protected] wrote: > > Thanks for the CheckIndex info, that worked! It looks like only one of > the segments in that shard has issues: > > 1 of 20: name=_1om docCount=216683 > codec=Lucene3x > compound=false > numFiles=10 > size (MB)=5,111.421 > diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, > source=merge, lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, > os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.6.0_26, > java.vendor=Sun Microsystems Inc.} > no deletions > test: open reader.........OK > test: check integrity.....OK > test: check live docs.....OK > test: fields..............OK [31 fields] > test: field norms.........OK [20 fields] > test: terms, freq, prox...ERROR: java.lang.AssertionError: > index=216690, numBits=216683 > java.lang.AssertionError: index=216690, numBits=216683 > at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252) > at > org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932) > at > org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325) > at > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051) > test: stored fields.......OK [3033562 total field count; avg 14 fields > per doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; > 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] > FAILED > WARNING: fixIndex() would remove reference to this segment; full > exception: > java.lang.RuntimeException: Term Index test failed > at > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051) > > This is on ES 1.3.4, but the index I was running optimize on was likely > created back in 0.9 or 1.0. > > On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote: >> >> Hmm, not good. >> >> Which version of ES? Do you have a full stack trace for the exception? >> >> To run CheckIndex you need to add all ES jars to the classpath. It's >> easiest to just use a wildcard for this, e.g.: >> >> java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex >> ... >> >> Make sure you have the double quotes so the shell does not expand that >> wildcard! >> >> Mike McCandless >> >> On Mon, Mar 23, 2015 at 9:50 PM, <[email protected]> wrote: >> >>> I did an optimize on this index and it looks like it caused a shard to >>> become corrupted. Or maybe the optimize just brought the shard corruption >>> to light? >>> >>> On the node that reported the corrupted shard I tried shutting it down, >>> moving the shard out and then restarting. Unfortunately the next node that >>> got that shard then started with the same corruption issues. The errors: >>> >>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN >>> ][indices.cluster ] [Meteorite II] [1-2013][0] failed to start >>> shard >>> Mar 24 01:40:17 localhost >>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: >>> [1-2013][0] failed to fetch index version after copying it over >>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN >>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed >>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], >>> indexUUID [_na_], reason [Failed to start shard, message >>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index >>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] >>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: >>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: >>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]] >>> >>> I tried using CheckIndex, but had this issue: >>> >>> java.lang.IllegalArgumentException: A SPI class of type >>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. >>> You need to add the corresponding JAR file supporting this SPI to your >>> classpath.The current classpath supports the following names: [Pulsing41, >>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, >>> FST41, FSTOrd41, Lucene40, Lucene41] >>> >>> When running with: >>> >>> java -cp >>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar >>> >>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex >>> >>> I'm not a java programmer so after I tried other classpath combinations >>> I was out of ideas. >>> >>> >>> Any tips? Looking at _cat/shards the replica is currently marked >>> "unassigned" while the primary is "initializing". Thanks! >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95617d13-13fa-4b36-86ec-cc60a37d54cd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
