[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1044: --------------------------------------- Attachment: LUCENE-1044.take4.patch OK I did a simplistic patch (attached) whereby FSDirectory has a background thread that re-opens, syncs, and closes those files that Lucene has written. (I'm using a modified version of the class from Doron's test). This patch is nowhere near ready to commit; I just coded up enough so we could get a rough measure of performance cost of syncing. EG we must prevent deletion of a commit point until a future commit point is fully sync'd to stable storage; we must also take care not to sync a file that has been deleted before we sync'd it; don't sync until the end when running with autoCommit=false; merges if run by ConcurrentMergeScheduler should [maybe] sync in the foreground; maybe forcefully throttle back updates if syncing is falling too far behind; etc. I ran the same alg as the tests above (index first 150K docs of Wikipedia). I ran CFS and no CFS X sync and nosync (4 tests) for each IO system. Time is the fastest of 2 runs: || IO System || CFS sync || CFS nosync || CFS % slower || non-CFS sync || non-CFS nosync || non-CFS % slower || | ReiserFS 6-drive RAID5 array Linux (2.6.22.1) | 188 | 157 | 19.7% | 143 | 147 | -2.7% | | EXT3 single internal drive Linux (2.6.22.1) | 173 | 157 | 10.2% | 136 | 132 | 3.0% | | 4 drive RAID0 array Mac Pro (10.4 Tiger) | 153 | 152 | 0.7% | 150 | 149 | 0.7% | | Win XP Pro laptop, single drive | 463 | 352 | 31.5% | 343 | 335 | 2.4% | | Mac Pro single external drive | 463 | 352 | 31.5% | 343 | 335 | 2.4% | The good news is, the non-CFS case shows very little cost when we do BG sync'ing! The bad news is, the CFS case still shows a high cost. However, by not sync'ing the files that go into the CFS (and also not committing a new segments_N file until after the CFS is written) I expect that cost to go way down. One caveat: I'm using a 8 MB RAM buffer for all of these tests. As Yonik pointed out, if you have a smaller buffer, or, you add just a few docs and then close your writer, the sync cost as a pctg of net indexing time will be quite a bit higher. > Behavior on hard power shutdown > ------------------------------- > > Key: LUCENE-1044 > URL: https://issues.apache.org/jira/browse/LUCENE-1044 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java > 1.5 > Reporter: venkat rangan > Assignee: Michael McCandless > Fix For: 2.3 > > Attachments: FSyncPerfTest.java, LUCENE-1044.patch, > LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch > > > When indexing a large number of documents, upon a hard power failure (e.g. > pull the power cord), the index seems to get corrupted. We start a Java > application as an Windows Service, and feed it documents. In some cases > (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the > following is observed. > The 'segments' file contains only zeros. Its size is 265 bytes - all bytes > are zeros. > The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes > are zeros. > Before corruption, the segments file and deleted file appear to be correct. > After this corruption, the index is corrupted and lost. > This is a problem observed in Lucene 1.4.3. We are not able to upgrade our > customer deployments to 1.9 or later version, but would be happy to back-port > a patch, if the patch is small enough and if this problem is already solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]