[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565701#action_12565701 ]
Michael McCandless commented on LUCENE-1044: -------------------------------------------- On thinking through the above costs of committing, I now think we should deprecate autoCommit=true entirely, making autocommit=false the only choice in 3.0. With that change, when you use an IndexWriter, its changes are never visible to a reader until you call commit() or close(). I think this is how KinoSearch and Ferret work, for example. Here are some reasons: * Commit has now become a costly event, because sync() is costly, and is forcing us to use this "syncPause" logic (hack) to game the OS, which really is ugly, dependent on OS/IO particulars, etc. * Since we make no guarantee on when a commit specifically happens, and this fix in particular will reduce its frequency from "every flush" to "every merge", autoCommit=true really is not that useful for applications (ie, they will have to call commit() on their anyway if they need to rely on its frequency). * It's always possible to build an autocommit layer above IndexWriter by calling commit on your own schedule, to tradeoff performance for commit frequency (but not vice/versa). * Not autocommitting by default opens up some good future optimizations on merging since we don't have to flush real segments to disk until commit. One simple example is we could skip building CFS files as we flush, and only merge & build CFS on commit/close. What do people think? If we do this, I would right now deprecate all ctors that take autoCommit and add comment explaining that in 3.0 autoCommit is wired to "false". I would leave the "syncPause" logic in there for now, because it's such a sizable performance gain on windows, but deprecate it, stating that with it will be removed when we switch to autoCommit=false in 3.0. > Behavior on hard power shutdown > ------------------------------- > > Key: LUCENE-1044 > URL: https://issues.apache.org/jira/browse/LUCENE-1044 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java > 1.5 > Reporter: venkat rangan > Assignee: Michael McCandless > Fix For: 2.4 > > Attachments: FSyncPerfTest.java, LUCENE-1044.patch, > LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch, > LUCENE-1044.take5.patch, LUCENE-1044.take6.patch > > > When indexing a large number of documents, upon a hard power failure (e.g. > pull the power cord), the index seems to get corrupted. We start a Java > application as an Windows Service, and feed it documents. In some cases > (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the > following is observed. > The 'segments' file contains only zeros. Its size is 265 bytes - all bytes > are zeros. > The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes > are zeros. > Before corruption, the segments file and deleted file appear to be correct. > After this corruption, the index is corrupted and lost. > This is a problem observed in Lucene 1.4.3. We are not able to upgrade our > customer deployments to 1.9 or later version, but would be happy to back-port > a patch, if the patch is small enough and if this problem is already solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]