[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless reopened LUCENE-1044: ---------------------------------------- OK I ran sync/nosync tests across various platforms/IO system. In each case I ran the test once with doSync=true and once with doSync=false, using this alg: analyzer=org.apache.lucene.analysis.SimpleAnalyzer doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker docs.file=/lucene/wikifull.txt doc.maker.forever=false ram.flush.mb = 8 max.buffered = 0 directory = FSDirectory max.field.length = 2147483647 doc.term.vector=false doc.stored=false work.dir = /tmp/lucene fsdirectory.dosync = false ResetSystemErase CreateIndex {AddDoc >: 150000 CloseIndex RepSumByName Ie, time to index the first 150K docs from Wikipedia. Results for single hard drive: Mac mini (10.5 Leopard) single 4200 RPM "notebook" (2.5") drive -- 2.3% slower: sync - 296.80 sec nosync - 290.06 sec Mac pro (10.4 Tiger), single external drive -- 35.5% slower: sync - 259.61 sec nosync - 191.53 sec Win XP Pro laptop, single drive -- 38.2% slower sync - 536.00 sec nosync - 387.90 sec Linux (2.6.22.1), ext3 single drive -- 23% slower sync - 185.42 sec nosync - 150.56 sec Results for multiple hard drives (RAID arrays): Linux (2.6.22.1), reiserfs 6 drive RAID5 array -- 49% slower (!!) sync - 239.32 sec nosync - 160.56 sec Mac Pro (10.4 Tiger), 4 drive RAID0 array -- 1% faster sync - 157.26 sec nosync - 158.93 sec So at this point I'm torn... The performance cost of the simplest approach (sync() before close()) is very costly in many cases (not just laptop IO subsystems). The reiserfs test was rather shocking. Then, it's oddly very lost cost in other cases: the Mac Mini test I find amazing. It's frustrating to lose such performance "out of the box" for the presumably extremely rare event of OS/machine crash/power cut. Maybe we should leave the default as false for now? > Behavior on hard power shutdown > ------------------------------- > > Key: LUCENE-1044 > URL: https://issues.apache.org/jira/browse/LUCENE-1044 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java > 1.5 > Reporter: venkat rangan > Assignee: Michael McCandless > Fix For: 2.3 > > Attachments: LUCENE-1044.patch, LUCENE-1044.take2.patch, > LUCENE-1044.take3.patch > > > When indexing a large number of documents, upon a hard power failure (e.g. > pull the power cord), the index seems to get corrupted. We start a Java > application as an Windows Service, and feed it documents. In some cases > (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the > following is observed. > The 'segments' file contains only zeros. Its size is 265 bytes - all bytes > are zeros. > The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes > are zeros. > Before corruption, the segments file and deleted file appear to be correct. > After this corruption, the index is corrupted and lost. > This is a problem observed in Lucene 1.4.3. We are not able to upgrade our > customer deployments to 1.9 or later version, but would be happy to back-port > a patch, if the patch is small enough and if this problem is already solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]