[ 
http://issues.apache.org/jira/browse/LUCENE-554?page=comments#action_12378295 ] 

Nadav Har'El commented on LUCENE-554:
-------------------------------------

Hi Otis, sorry about lingering with this patch (I've been very busy, not to 
mention a daughter two weeks ago :-) I still want to test it a bit more before 
publishing it).

Anyway, what you suggest is not quite enough, because the "segments.old" file 
you added is never actually used in Lucene; The problem with the existing code 
is not that we don't keep a copy of some valid segments file around. Rather, 
the problem is that at some stage the "segments" file does not exist, and just 
"segments.new" exists. You have the same issue with your suggestion (in the 
middle of step 3) with the addition of a (unused) segments.old. And when the 
reader wants to read the segment file, he only tries to read "segments", and 
not "segments.new" (or "segments.old").

Instead, I think the main fix should be in the segment reading code: if we 
can't read the "segments" file (it does not exist, or is corrupt), we should 
fall back to reading the "segments.new" file, in case that exists (and rename 
it to "segments" to avoid the mess).

By the way, 3 days ago (May 3, 2006), Karel Jejnora posted on the developers' 
mailing list that he found another Lucene  bug with a different cause, but 
similar effect (losing a huge chunk of the index if Lucene crashes at a bad 
time).  According to his detailed post, during a merge when the compound file 
format is used, there is a moment where we already have a new segments file 
pointing to a new ".cfs" file, but this file doesn't yet exist (rather, parts 
of this file exist as individual files, or the compound file exists but named 
*.tmp). He points to the problem in lines 696-712 of IndexWriter's 
mergeSegments(int,int).

I wonder if more similar bugs exist (I tried to solicit ideas from the mailing 
list, but nobody had any).

> Possible index corruption if crashing while replacing segments file
> -------------------------------------------------------------------
>
>          Key: LUCENE-554
>          URL: http://issues.apache.org/jira/browse/LUCENE-554
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Versions: 1.9
>     Reporter: Nadav Har'El
>     Priority: Minor

>
> Lucene's indexing is expected to be reasonably tolerant to computer crashes 
> or the indexing process being killed. By reasonably tolerant, I mean that it 
> is ok to lose a few documents (those currently buffered in memory), or have 
> to repeat some work (e.g., a long merge that was in progress) - but it is not 
> ok for the entire index, or large chunks of it, to become irreversebly 
> corrupt.
> The fact that Lucene works by repeated merging of several small segments into 
> a new larger segments, solves most of the crash problems, because until the 
> new segment is fully created, the old segments are still there and fully 
> functional. However, one possibility for corruption remains in the segment 
> replacement code:
> After a new segment is created, a new segments file is written as a new file 
> "segments.new", and then this file is renamed to "segments". The problem is 
> that this renaming is done using Directory.renameFile(), and 
> FSDirectory.renameFile is *NOT* atomic: it first deletes the old file, and 
> then renames the new file. A crash between these stages (or perhaps during 
> Java's rename which also isn't guaranteed to be atomic) will potentially 
> leave us without a working "segments" file.
> I will post here a patch for this bug shortly.
> The patch will also include a change to Directory.renameFile()'s Javadoc. It 
> currently claims "This replacement should be atomic.", which is false in 
> FSDirectory. Instead it should make a weaker claim, for example
>    "This replacement does not have to be atomic, but must at least obey a 
> weaker guarantee: at any time during the replacement, either the "from" file 
> is still available, or the "to" file is available with either the new or old 
> content."
> (or, we can just drop the guaranteee altogether, like Java's File.renameTo() 
> provides no atomic-ness guarantees).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to