[ 
http://issues.apache.org/jira/browse/LUCENE-702?page=comments#action_12447968 ] 
            
Michael McCandless commented on LUCENE-702:
-------------------------------------------

I think we should try to make all of the addIndexes calls (and more
generally any call to Lucene) "transactional".  Meaning, if the call
is aborted (machine crashes, disk full, jvm killed, neutrino hits CPU,
etc.) then your index just "rolls back" to where it was at the start
of the call.  Ie, it is consistent and none of the incoming documents
were added.

This way your index is fine after the crash, and, you can fix the
cause of the crash and re-run the addIndexes call and you won't get
duplicate documents.

To achieve this, each of the three addIndexes methods would need to 1)
not commit a new segments file until the end, and 2) not delete any
segments referenced by the initial segments file (segmentInfos) until
the end.

We have three methods now for addIndexes:

  * For addIndexes(IndexReader[]): this method is I think already
    transactional.  We create a merger, add all readers to it, do the
    merge, and only at the end commit the new segments file & remove
    old segments.

  * For addIndexes(Directory[]): this method can currently corrupt the
    index if aborted.  However, because all merging is done only on
    the newly added segments, I think the fix is simply to not commit
    the new segments file until the end?

  * For addIndexesNoOptimize(Directory[]): this method can also
    currently corrupt the index if aborted.  To fix this I think we
    need to not only prevent committing a new segments file until the
    end, but also to prevent deletion of any segments in the original
    segments file.  This is because it's able (I think?) to merge
    both old and new segments in its step 3.  This would normally
    result in deleting those old segments that were merged.

    Note that this will increase the temporary disk usage used during
    the call, because old segments must remain on disk even if they
    have been merged, but I think this is the right tradeoff
    (transactional vs temporary disk usage)?

Also note that we would need the fixes from lockless LUCENE-701 to
properly delete orphan'd segments after an abort.  Without the
IndexFileDeleter I think they would stick around indefinitely.

Does this approach sound reasonable/right?  Any feedback?


> Disk full during addIndexes(Directory[]) can corrupt index
> ----------------------------------------------------------
>
>                 Key: LUCENE-702
>                 URL: http://issues.apache.org/jira/browse/LUCENE-702
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>
> This is a spinoff of LUCENE-555
> If the disk fills up during this call then the committed segments file can 
> reference segments that were not written.  Then the whole index becomes 
> unusable.
> Does anyone know of any other cases where disk full could corrupt the index?
> I think disk full should worse lose the documents that were "in flight" at 
> the time.  It shouldn't corrupt the index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to