Yes, reproduced in first try. See attached program - I referenced it to current trunk.
On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko <[email protected]>wrote: > Christopher, > > I used the IndexBuilder app from here > https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a > 8.5GB wikipedia dump. > > After running for 2.5 days I had to forcefully close it (infinite loop in > the wiki-markdown parser at 92%, go figure), and the 40-something GB index > I had by then was unusable. I then was able to reproduce this > > Please note I now added a few safe-guards you might want to remove to make > sure the app really crashes on process kill. > > I'll try to come up with a better way to reproduce this - hopefully Mike > will be able to suggest better ways than manual process kill... > > On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens < > [email protected]> wrote: > >> Mike, The codebase for lucene.net should be almost identical to java's >> 3.0.3 release, and LUCENE-1044 is included in that. >> >> Itamar, are you committing the index regularly? I only ask because I >> can't >> reproduce it myself by forcibly terminating the process while it's >> indexing. I've tried both 3.0.3 and 2.9.4. If I don't commit at all and >> terminate the process (even with a 10,000 4K documents created), there >> will >> be no documents in the index when I open it in luke, which I expect. If I >> commit at 10,000 documents, and terminate it a few thousand after that, >> the >> index has the first ten thousand that were committed. I've even >> terminated >> it *while* a second commit was taking place, and it still had all of the >> documents I expected. >> >> It may be that I'm not trying to reproducing it correctly. Do you have a >> minimal amount of code that can reproduce it? >> >> >> Thanks, >> Christopher >> >> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless < >> [email protected]> wrote: >> >> > Hi Itamar, >> > >> > One quick question: does Lucene.Net include the fixes done for >> > LUCENE-1044 (to fsync files on commit)? Those are very important for >> > an index to be intact after OS/JVM crash or power loss. >> > >> > More responses below: >> > >> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <[email protected] >> > >> > wrote: >> > >> > > I'm a Lucene.Net committer, and there is a chance we have a bug in our >> > > FSDirectory implementation that causes indexes to get corrupted when >> > > indexing is cut while the IW is still open. As it roots from some >> > > retroactive fixes you made, I'd appreciate your feedback. >> > > >> > > Correct me if I'm wrong, but by design Lucene should be able to >> recover >> > > rather quickly from power failures or app crashes. Since existing >> segment >> > > files are read only, only new segments that are still being written >> can >> > get >> > > corrupted. Hence, recovering from worst-case scenarios is done by >> simply >> > > removing the write.lock file. The worst that could happen then is >> having >> > the >> > > last segment damaged, and that can be fixed by removing those files, >> > > possibly by running CheckIndex on the index. >> > >> > You shouldn't even have to run CheckIndex ... because (as of >> > LUCENE-1044) we now fsync all segment files before writing the new >> > segments_N file, and then removing old segments_N files (and any >> > segments that are no longer referenced). >> > >> > You do have to remove the write.lock if you aren't using >> > NativeFSLockFactory (but this has been the default lock impl for a >> > while now). >> > >> > > Last week I have been playing with rather large indexes and crashed my >> > app >> > > while it was indexing. I wasn't able to open the index, and Luke was >> even >> > > kind enough to wipe the index folder clean even though I opened it in >> > > read-only mode. I re-ran this, and after another crash running >> CheckIndex >> > > revealed nothing - the index was detected to be an empty one. I am not >> > > entirely sure what could be the cause for this, but I suspect it has >> > > been corrupted by the crash. >> > >> > Had no commit completed (no segments file written)? >> > >> > If you don't fsync then all sorts of crazy things are possible... >> > >> > > I've been looking at these: >> > > >> > > >> > >> https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> > > >> > >> https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> > >> > (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328 >> > broke...). >> > >> > > And it seems like this is what I was experiencing. Mike and Mark will >> > > probably be able to tell if this is what they saw or not, but as far >> as I >> > > can tell this is not an expected behavior of a Lucene index. >> > >> > Definitely not expected behavior: assuming nothing is flipping bits, >> > then on OS/JVM crash or power loss your index should be fine, just >> > reverted to the last successful commit. >> > >> > > What I'm looking for at the moment is some advice on what FSDirectory >> > > implementation to use to make sure no corruption can happen. The 3.4 >> > version >> > > (which is where LUCENE-3418 was committed to) seems to handle a lot >> of >> > > things the 3.0 doesn't, but on the other hand LUCENE-3418 was >> introduced >> > by >> > > changes made to the 3.0 codebase. >> > >> > Hopefully it's just that you are missing fsync! >> > >> > > Also, is there any test in the suite checking for those scenarios? >> > >> > Our test framework has a sneaky MockDirectoryWrapper that, after a >> > test finishes, goes and corrupts any unsync'd files and then verifies >> > the index is still OK... it's good because it'll catch any times we >> > are missing calls t sync, but, it's not low level enough such that if >> > FSDir is failing to actually call fsync (that wsa the bug in >> > LUCENE-3418) then it won't catch that... >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > >
Program.cs
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
