[ 
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565701#action_12565701
 ] 

Michael McCandless commented on LUCENE-1044:
--------------------------------------------

On thinking through the above costs of committing, I now think we
should deprecate autoCommit=true entirely, making autocommit=false the
only choice in 3.0.

With that change, when you use an IndexWriter, its changes are never
visible to a reader until you call commit() or close().  I think this
is how KinoSearch and Ferret work, for example.

Here are some reasons:

  * Commit has now become a costly event, because sync() is costly,
    and is forcing us to use this "syncPause" logic (hack) to game the
    OS, which really is ugly, dependent on OS/IO particulars, etc.

  * Since we make no guarantee on when a commit specifically happens,
    and this fix in particular will reduce its frequency from "every
    flush" to "every merge", autoCommit=true really is not that useful
    for applications (ie, they will have to call commit() on their
    anyway if they need to rely on its frequency).

  * It's always possible to build an autocommit layer above
    IndexWriter by calling commit on your own schedule, to tradeoff
    performance for commit frequency (but not vice/versa).

  * Not autocommitting by default opens up some good future
    optimizations on merging since we don't have to flush real
    segments to disk until commit.  One simple example is we could
    skip building CFS files as we flush, and only merge & build CFS on
    commit/close.

What do people think?

If we do this, I would right now deprecate all ctors that take
autoCommit and add comment explaining that in 3.0 autoCommit is wired
to "false".  I would leave the "syncPause" logic in there for now,
because it's such a sizable performance gain on windows, but deprecate
it, stating that with it will be removed when we switch to
autoCommit=false in 3.0.


> Behavior on hard power shutdown
> -------------------------------
>
>                 Key: LUCENE-1044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1044
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 
> 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: FSyncPerfTest.java, LUCENE-1044.patch, 
> LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch, 
> LUCENE-1044.take5.patch, LUCENE-1044.take6.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. 
> pull the power cord), the index seems to get corrupted. We start a Java 
> application as an Windows Service, and feed it documents. In some cases 
> (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the 
> following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes 
> are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes 
> are zeros.
> Before corruption, the segments file and deleted file appear to be correct. 
> After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our 
> customer deployments to 1.9 or later version, but would be happy to back-port 
> a patch, if the patch is small enough and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to