Here's checkindex:
NOTE: testing will be more thorough if you run java with
'-ea:org.apache.lucene', so assertions are enabled
Opening index @ /vol/solr/data/index/
Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene
2.4]
1 of 8: name=_ks4 docCount=2504982
compound=false
hasProx=true
numFiles=11
size (MB)=3,965.695
no deletions
test: open reader.........OK
test: fields, norms.......OK [343 fields]
test: terms, freq, prox...OK [37238560 terms; 161527224 terms/docs
pairs; 186273362 tokens]
test: stored fields.......OK [55813402 total field count; avg 22.281
fields per doc]
test: term vectors........OK [7998458 total vector count; avg 3.193
term/freq vector fields per doc]
2 of 8: name=_oaw docCount=514635
compound=false
hasProx=true
numFiles=12
size (MB)=746.887
has deletions [delFileName=_oaw_1rb.del]
test: open reader.........OK [155528 deleted docs]
test: fields, norms.......OK [172 fields]
test: terms, freq, prox...OK [7396227 terms; 28146962 terms/docs pairs;
17298364 tokens]
test: stored fields.......OK [5736012 total field count; avg 15.973
fields per doc]
test: term vectors........OK [1045176 total vector count; avg 2.91
term/freq vector fields per doc]
3 of 8: name=_tll docCount=827949
compound=false
hasProx=true
numFiles=12
size (MB)=761.782
has deletions [delFileName=_tll_2fs.del]
test: open reader.........OK [39283 deleted docs]
test: fields, norms.......OK [180 fields]
test: terms, freq, prox...OK [10925397 terms; 43361019 terms/docs pairs;
42123294 tokens]
test: stored fields.......OK [8673255 total field count; avg 10.997
fields per doc]
test: term vectors........OK [880272 total vector count; avg 1.116
term/freq vector fields per doc]
4 of 8: name=_tdx docCount=18372
compound=false
hasProx=true
numFiles=12
size (MB)=56.856
has deletions [delFileName=_tdx_9.del]
test: open reader.........OK [18368 deleted docs]
test: fields, norms.......OK [50 fields]
test: terms, freq, prox...OK [261974 terms; 2018842 terms/docs pairs;
150 tokens]
test: stored fields.......OK [76 total field count; avg 19 fields per
doc]
test: term vectors........OK [14 total vector count; avg 3.5 term/freq
vector fields per doc]
5 of 8: name=_te8 docCount=19929
compound=false
hasProx=true
numFiles=12
size (MB)=60.475
has deletions [delFileName=_te8_a.del]
test: open reader.........OK [19900 deleted docs]
test: fields, norms.......OK [72 fields]
test: terms, freq, prox...OK [276045 terms; 2166958 terms/docs pairs;
1196 tokens]
test: stored fields.......OK [522 total field count; avg 18 fields per
doc]
test: term vectors........OK [132 total vector count; avg 4.552
term/freq vector fields per doc]
6 of 8: name=_tej docCount=22201
compound=false
hasProx=true
numFiles=12
size (MB)=65.827
has deletions [delFileName=_tej_o.del]
test: open reader.........OK [22171 deleted docs]
test: fields, norms.......OK [50 fields]
test: terms, freq, prox...FAILED
WARNING: would remove reference to this segment (-fix was not
specified); full exception:
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
at org.apache.lucene.util.BitVector.get(BitVector.java:91)
at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
at
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:222)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433)
7 of 8: name=_1agw docCount=1717926
compound=false
hasProx=true
numFiles=12
size (MB)=2,390.413
has deletions [delFileName=_1agw_1.del]
test: open reader.........OK [1 deleted docs]
test: fields, norms.......OK [438 fields]
test: terms, freq, prox...OK [20959015 terms; 101603282 terms/docs
pairs; 123561985 tokens]
test: stored fields.......OK [26248407 total field count; avg 15.279
fields per doc]
test: term vectors........OK [4911368 total vector count; avg 2.859
term/freq vector fields per doc]
8 of 8: name=_1agz docCount=1
compound=false
hasProx=true
numFiles=8
size (MB)=0
no deletions
test: open reader.........OK
test: fields, norms.......OK [6 fields]
test: terms, freq, prox...OK [6 terms; 6 terms/docs pairs; 6 tokens]
test: stored fields.......OK [6 total field count; avg 6 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
WARNING: 1 broken segments detected
WARNING: 30 documents would be lost if -fix were specified
NOTE: would write new segments file [-fix was not specified]
On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <[email protected]> wrote:
> I will but I bet I can guess what happened -- this index has many
> duplicates in it as well (same uniqueKey id multiple times) - this happened
> to us once before and it was because the solr server went down during an
> add. We may have to re-index, but I will run checkIndex now. Thanks
> (Thread for dupes here :
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%[email protected]%3e)
>
>
> On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless <
> [email protected]> wrote:
>
>> It looks like your index has some kind of corruption. Were there any
>> other
>> exceptions prior to this one, or, any previous problems with the OS/IO
>> system?
>>
>> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
>> usage) and post the output?
>> Mike
>>
>> Brian Whitman <[email protected]> wrote:
>>
>> > I am getting this on a 10GB index (via solr 1.3) during an optimize:
>> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
>> > SEVERE: java.io.IOException: background merge hit exception:
>> _ks4:C2504982
>> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
>> > _1agw:C1717926
>> > _1agz:C1 into _1ah2 [optimize]
>> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
>> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
>> > at
>> >
>> >
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
>> > at
>> >
>> >
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
>> > ...
>> >
>> > Exception in thread "Lucene Merge Thread #2"
>> > org.apache.lucene.index.MergePolicy$MergeException:
>> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range:
>> 34950
>> > at
>> >
>> >
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
>> > at
>> >
>> >
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
>> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
>> > range: 34950
>> > at org.apache.lucene.util.BitVector.get(BitVector.java:91)
>> > at
>> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
>> > at
>> >
>> >
>> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
>> > ...
>> >
>> >
>> > Does anyone know how this is caused and how I can fix it? It happens
>> with
>> > every optimize. Commits were very slow on this index as well (40x as
>> slow
>> > as
>> > a similar index on another machine) I have plenty of disk space (many
>> 100s
>> > of GB) free.
>> >
>>
>
>