Re: Background Merges

Grant Ingersoll Tue, 18 Dec 2007 10:09:58 -0800

I think the issue is my fault, but I am not exactly sure how ithappened. I deleted my index and have not been able to reproduce theproblem since.


However, here's what I can tell from some debugging I did before that:

The field that is causing the problem in the stack trace is neitherbinary nor compressed, nor is it even stored. So, the fact that it isbeing merged (see the stack trace) is just wrong, since it isn'tstored. I start out with 6 fields, 2 of which are stored. When Icome into FieldsReader, it gets the correct number of Fields, howeverthey must be out of order from when I originally indexed or somethinglike that. AFAICT, FieldsWriter is also correctly writing out theFields. In looking at SegmentMerger, we are in the else clause of

if (matchingSegmentReader != null) {
                // We can optimize this case (doing a bulk
                // byte copy) since the field numbers are
                // identical
                int start = j;
                int numDocs = 0;
                do {
                  j++;
                  numDocs++;

} while(j < maxDoc && !matchingSegmentReader.isDeleted(j) && numDocs < MAX_RAW_MERGE_DOCS);

IndexInput stream =matchingFieldsReader.rawDocs(rawDocLengths, start, numDocs);fieldsWriter.addRawDocuments(stream, rawDocLengths,numDocs);

                docCount += numDocs;
              } else {

fieldsWriter.addDocument(reader.document(j,fieldSelectorMerge)); /////// HERE

                j++;
                docCount++;
              }

Based on the comment in the if condition, I am assuming the fieldnumbers are not identical in this clause, which would explain the factthat the Fields info is being misinterpreted.

I still wonder if there isn't a problem in that somehow the index gotcorrupted such that the Field numbering was off between various runsof the IndexWriter? Does that even seem possible in the code?

I am just thinking out loud here, not sure if it even makes sense. Ithink we can just put this on hold for now and see if it comes upagain, since I can't reproduce it (and I forgot to save the mislabeledindex)


-Grant


On Dec 18, 2007, at 7:27 AM, Grant Ingersoll wrote:

No, there were not any exceptions during indexing. I am stilltrying to work up some test cases using open documents (i.e.wikipedia)
-Grant

On Dec 18, 2007, at 6:09 AM, Michael McCandless wrote:
Grant,
Do you know whether you hit any exceptions while adding docs,before you hit those merge exceptions?
I have found one case where an exception that runs back throughDocumentsWriter (during addDocument()) can produce a corrupt fdt(stored field) file. I have a test case that shows this, and a fix.
Mike

Grant Ingersoll wrote:
I will try to work up a test case that I can share and will doublecheck that I have all the right pieces in place.
-Grant

On Dec 17, 2007, at 2:50 PM, Michael McCandless wrote:
Yonik Seeley wrote:
On Dec 17, 2007 2:15 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
Not good!

It's almost certainly a bug with Lucene, I think, because Solr is
just a consumer of Lucene's API, which shouldn't ever causesomething
like this.
Yeah... a solr level commit should just translate into
writer.close
reader.open  // assuming there are "overwrites"
delete duplicates via TermDocs
reader.close
writer.open
writer.optimize
writer.close
Seems fine!
Apparently, while merging stored fields, SegmentMerger tried toread
too far.
The code to merge stored fields was recently optimized to dobulk copy
of contiguous fields, right?
Yes, I'm wondering the same thing... though Grant's exception ison the un-optimized case, because the field name->number mappingdiffered for that segment. I'll scrutinize that change somemore...
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Background Merges

Reply via email to