Here's the issue:
https://issues.apache.org/jira/browse/LUCENE-3255
It's because we read the first 0 int to be an ancient segments file
format, and the next 0 int to mean there are no segments. Yuck!
This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
supporting this ancient format... but I don't see any easy way to fix
this pre-3.x where we must (by our back compat rules) support such an
ancient index.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jun 28, 2011 at 10:09 AM, mark harwood <[email protected]> wrote:
> I've got Greg's bad segment file and it does look to be all zeros and if I
> drop
> it into an existing index directory with the name segment_N+1 it reproduces
> the
> error i.e. IndexReader opens the index as if it contains zero docs.
> Preparing a Jira as we speak.
>
>
> ----- Original Message ----
> From: Michael McCandless <[email protected]>
> To: [email protected]
> Sent: Tue, 28 June, 2011 14:59:48
> Subject: Re: Corrupt segments file full of zeros
>
> On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <[email protected]> wrote:
>> Hi Mike.
>>>>Hmmm -- what code are you running here, to print the number of docs?
>>
>> SegmentInfos.setInfoStream(System.out);
>> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
>> IndexReader r = IndexReader.open(dir, true);
>> System.out.println("index has "+r.maxDoc()+" docs");
>>
>> From my own tests outside of Greg's environment I've found Lucene to be doing
>> all the right things and IndexReader falls back gracefully to the previous
>> commit e.g. here is the output from when I deliberately killed an update
>> after
>> prepareToCommit, leaving segment_2 and segment_3 and then vandalised
> segment_3
>> with all zero bytes:
>> SIS [main]: directory listing genA=3
>> SIS [main]: fallback check: 2; 2
>> SIS [main]: segments.gen check: genB=2
>> SIS [main]: primary Exception on 'segments_3': java.io.IOException: read
> past
>> EOF'; will retry: retry=false; gen = 3
>> SIS [main]: fallback to prior segment file 'segments_2'
>> SIS [main]: success on fallback segments_2
>>
>> Lucene does the right thing going back to _2. I can't yet see why in Greg's
>> environment (NFS based) it fails to see _4vc as corrupt in the same way the
>> above test correctly sees _3 as corrupt.
>
> Hmm. Mark, if you vandalise segments_3 with 0s, and then remove
> segmetns_2, what happens when you try to open the IndexReader? (I
> would expect exc).
>
> Greg, can you post the full stdout you see from SIS after enabling its
> infoStream in the case that returns an IR with 0 docs (ie when you
> delete segments_4vb).
>
> Also: if you don't delete any of the segments_N file, and run the same
> code, how many docs do you get?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]