Re: java.lang.RuntimeException: after flush: fdx size mismatch

Michael McCandless Sat, 30 May 2009 03:54:34 -0700

Woops, here's the patch (added you, diretly, on the "To:" so that you
get the patch; Apache's list manager strips patches).


Yes, if the fdx file is getting deleted out from under Lucene, that'd
also explain what's happening.  Though the timing would have to be
very quick.  What's happening is Lucene had opened _X.fdx for writing,
written some small # bytes to it, and then closed it but found the
file no longer exists.

I'm not familiar with what exactly happens when you create/unload Solr
cores and move them around machines; does this involve moving files
from one machine to another?  (Ie, deleting files)?  If so, is there
some way to log when such migrations take place and try to correlate
to this exception?

Mike

On Fri, May 29, 2009 at 8:29 PM, James X
<hello.nigerian.spamm...@gmail.com> wrote:
> Hi Mike,I don't see a patch file here?
>
> Could another explanation be that the fdx file doesn't exist yet / has been
> deleted from underneath Lucene?
>
> I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly,
> moving the bundled cores around between machines. I find it much more likely
> that there's something wrong with my core admin code than there is with the
> Lucene internals :) It's possible that I'm occasionally removing files which
> are currently in use by a live core...
>
> I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm
> not sure how Amazon implement the local hard disk, but I assume it's a real
> hard disk exposed by the hypervisor.
>
> Thanks,
> James
>
> On Fri, May 29, 2009 at 3:41 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
>> file, yet the directory says the file does not exist.
>>
>> Can you re-run with this new patch?  I'm suspecting that FieldsWriter
>> wrote to one segment, but somehow we are then looking at the wrong
>> segment.  The attached patch prints out which segment FieldsWriter
>> actually wrote to.
>>
>> What filesystem & underlying IO system/device are you using?
>>
>> Mike
>>
>> On Thu, May 28, 2009 at 10:53 PM, James X
>> <hello.nigerian.spamm...@gmail.com> wrote:
>> > My apologies for the delay in running this patched Lucene build - I was
>> > temporarily pulled onto another piece of work.
>> >
>> > Here is a sample 'fdx size mismatch' exception using the patch Mike
>> > supplied:
>> >
>> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
>> docs
>> > vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
>> > fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
>> > fieldsWriter.fieldsFilePointer=2395
>> >        at
>> >
>> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>> >        at
>> >
>> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>> >        at
>> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>> >        at
>> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>> >        at
>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>> >        at
>> > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>> >        at
>> > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>> >
>> >
>> > Will now run with assertions enabled and see how that affects the
>> behaviour!
>> >
>> > Thanks,
>> > James
>> >
>> > ---------- Forwarded message ----------
>> > From: James X <hello.nigerian.spamm...@gmail.com>
>> > Date: Thu, May 21, 2009 at 2:24 PM
>> > Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
>> > To: solr-user@lucene.apache.org
>> >
>> >
>> > Hi Mike,Documents are web pages, about 20 fields, mostly strings, a
>> couple
>> > of integers, booleans and one html field (for document body content).
>> >
>> > I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
>> > that would mean I have several active Solr worker threads.
>> >
>> > The only exceptions I have are the RuntimeException flush errors,
>> followed
>> > by a handful (normally 10-20) of LockObtainFailedExceptions, which i
>> > presumed were being caused by the faulty threads dying and failing to
>> > release locks.
>> >
>> > Oh wait, I am getting WstxUnexpectedCharException exceptions every now
>> and
>> > then:
>> > SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
>> > ((CTRL-CHAR, code 8))
>> >  at [row,col {unknown-source}]: [1,26070]
>> >        at
>> > com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>> >        at
>> >
>> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
>> >        at
>> >
>> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
>> >        at
>> >
>> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>> >        at
>> >
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>> >        at
>> > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>> >        at
>> >
>> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)
>> >
>> > I presumed these were caused by character encoding issues, but haven't
>> > looked into them at all yet.
>> >
>> > Thanks again for your help! I'll make some time this afternoon to build
>> some
>> > patched Lucene jars and get the results
>> >
>> >
>> > On Thu, May 21, 2009 at 5:06 AM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> Another question: are there any other exceptions in your logs?  Eg
>> >> problems adding certain documents, or anything?
>> >>
>> >> Mike
>> >>
>> >> On Wed, May 20, 2009 at 11:18 AM, James X
>> >> <hello.nigerian.spamm...@gmail.com> wrote:
>> >> > Hi Mike, thanks for the quick response:
>> >> >
>> >> > $ java -version
>> >> > java version "1.6.0_11"
>> >> > Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
>> >> > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
>> >> >
>> >> > I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
>> >> > hitting that yet!
>> >> >
>> >> > The exception always reports 0 length, but the number of of docs
>> varies,
>> >> > heavily weighted towards 1 or two docs. Of the last 130 or so
>> exceptions:
>> >> >     89 1 docs vs 0 length
>> >> >     20 2 docs vs 0 length
>> >> >      9 3 docs vs 0 length
>> >> >      1 4 docs vs 0 length
>> >> >      3 5 docs vs 0 length
>> >> >      2 6 docs vs 0 length
>> >> >      1 7 docs vs 0 length
>> >> >      1 9 docs vs 0 length
>> >> >      1 10 docs vs 0 length
>> >> >
>> >> > The only unusual thing I can think of that we're doing with Solr is
>> >> > aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to
>> spot
>> >> a
>> >> > pattern between core admin operations and these exceptions, however...
>> >> >
>> >> > James
>> >> >
>> >> > On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
>> >> > luc...@mikemccandless.com> wrote:
>> >> >
>> >> >> Hmm... somehow Lucene is flushing a new segment on closing the
>> >> >> IndexWriter, and thinks 1 doc had been added to the stored fields
>> >> >> file, yet the fdx file is the wrong size (0 bytes).  This check (&
>> >> >> exception) are designed to prevent corruption from entering the
>> index,
>> >> >> so it's at least good to see CheckIndex passes after this.
>> >> >>
>> >> >> I don't think you're hitting LUCENE-1521: that issue only happens if
>> a
>> >> >> single segment has more than ~268 million docs.
>> >> >>
>> >> >> Which exact JRE version are you using?
>> >> >>
>> >> >> When you hit this exception, is it always "1 docs vs 0 length in
>> bytes"?
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Wed, May 20, 2009 at 3:19 AM, James X
>> >> >> <hello.nigerian.spamm...@gmail.com> wrote:
>> >> >> > Hello all,I'm running Solr 1.3 in a multi-core environment. There
>> are
>> >> up
>> >> >> to
>> >> >> > 2000 active cores in each Solr webapp instance at any given time.
>> >> >> >
>> >> >> > I've noticed occasional errors such as:
>> >> >> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch:
>> 1
>> >> >> docs
>> >> >> > vs 0 length in bytes of _h.fdx
>> >> >> >        at
>> >> >> >
>> >> >>
>> >>
>> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
>> >> >> >        at
>> >> >> >
>> >> >>
>> >>
>> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>> >> >> >        at
>> >> >> >
>> >> >>
>> >>
>> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>> >> >> >        at
>> >> >> >
>> >> >>
>> >>
>> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>> >> >> >        at
>> >> >> >
>> >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>> >> >> >        at
>> >> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>> >> >> >        at
>> >> >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>> >> >> >        at
>> >> >> >
>> >> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>> >> >> >        at
>> >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>> >> >> >        at
>> >> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>> >> >> >        at
>> >> >> >
>> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>> >> >> >
>> >> >> > during commit / optimise operations.
>> >> >> >
>> >> >> > These errors then cause cascading errors during updates on the
>> >> offending
>> >> >> > cores:
>> >> >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock
>> obtain
>> >> >> timed
>> >> >> > out: SingleInstanceLock: write.lock
>> >> >> >        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>> >> >> >        at
>> >> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
>> >> >> >        at
>> >> >> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:924)
>> >> >> >        at
>> >> >> >
>> >> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116)
>> >> >> >        at
>> >> >> >
>> >> >>
>> >>
>> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
>> >> >> >
>> >> >> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521,
>> but
>> >> >> when I
>> >> >> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
>> >> >> >
>> >> >> > CheckIndex doesn't find any problems with the index, and problems
>> >> >> disappear
>> >> >> > after an (inconvenient, for me) restart of Solr.
>> >> >> >
>> >> >> > Firstly, can I as the symptoms are so close to those in 1521, can I
>> >> check
>> >> >> my
>> >> >> > Lucene upgrade method should work:
>> >> >> > - unzip the Solr 1.3 war
>> >> >> > - remove the Lucene 2.4dev jars
>> >> >> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
>> >> >> > lucene-memory,lucene-highlighter, lucene-analyzers)
>> >> >> > - move in the Lucene 2.4.1 jars
>> >> >> > - rezip the directory structures as solr.war.
>> >> >> >
>> >> >> > I think this has worked, as solr/default/admin/registry.jsp shows:
>> >> >> >  <lucene-spec-version>2.4.1</lucene-spec-version>
>> >> >> >  <lucene-impl-version>2.4.1 750176 - 2009-03-04
>> >> >> > 21:56:52</lucene-impl-version>
>> >> >> >
>> >> >> > Secondly, if this Lucene fix isn't the right solution to this
>> problem,
>> >> >> can
>> >> >> > anyone suggest an alternative approach? The only problems I've had
>> up
>> >> to
>> >> >> now
>> >> >> > is to do with the number of allowed file handles, which was fixed
>> by
>> >> >> > changing limits.conf (RHEL machine).
>> >> >> >
>> >> >> > Many thanks!
>> >> >> > James
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: java.lang.RuntimeException: after flush: fdx size mismatch

Reply via email to