Every merge hit the exception, yes. And actually, the exceptions list is not used anywhere besides MT adding the exception to the list. That's why I was curious why it's there.
I still think we should protect this case somehow, because even if it hits a disk-full exception, there's no point continuing to run infinitely? So maybe before spwaning the next thread, check the exceptions list and if it goes over a certain threshold (10?) fail? On Tue, Apr 28, 2009 at 3:23 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Apr 28, 2009 at 6:09 AM, Shai Erera <ser...@gmail.com> wrote: > > Hi > > > > I think I've hit a bug in ConcurrentMergeScheduler, but I'd like those > who > > are more familiar with the code to review it. I ran > > TestStressSort.testSort() and started to get AIOOB exceptions from > > MergeThread, the CPU spiked to 98-100% and did not end for a couple of > > minutes, until I was able to regain control and kill the process (looks > like > > an infinite loop). > > > > To reproduce it all you need is to add the following line to > > PQ.initialize(): size = maxSize, and then you'll get the aforementioned > > exceptions. I did it acceindentally, but I'm sure there's a way to > reproduce > > it with a JUnit test or something so that it will happen consistently. > > It sounds like this caused every merge to always hit an exception while > merging? > > > When I debugged-trace the test, I noticed that MergeThread are just > spawned > > forever. The reason is this: In CMS.merge(IndexWriter) there's a 'while > > (true)' loop which does 'while (mergeThreadCount() >= maxThreadCount)' > and > > if false just spawns a new MergeThread. On the other hand, in > > MergeThread.run there's a try-finally which executes whatever it needs to > > execute and in the finally block removes this thread from the list of > > threads. That causes CMS to spawn a new thread, which will hit another > > exception, remove itself from the queue and CMS will spawn a new thread. > > That puts the code into an infinite loop. > > Unfortunately, this is tricky to fix correctly, because > IW/MergeScheduler knows so little about what actually failed. > > Right now, if a merge hits an exception, IW simply undoes everything > it did (removes partial files, allows new merges to try merging the > segments again, etc). > > If it was some transient IO error, or say a transient disk full > situation, retrying the merge seems good. But if it's some > [temporary] bug in Lucene, and every merge will always hit an > exception, then retrying is hopeless. Likewise a corrupt index, a > disk full that won't clear up, sudden permission denied errors on > opening new files, etc., retrying is hopeless. > > > That sounds like a bug to me ... I think that if MergeThread hits any > > exception, the merge should fail? Anyway, the exception is added to an > > exceptions List, which is a private member of CMS but is never chceked by > > CMS. Perhaps merge(IndexWriter) should check if the exceptions list is > not > > empty and fail the merge in such case? > > Actually the merge does "fail", and IW "undoes" its changes, but then > MergePolicy is free to pick merges again, and in turn picks the very > same merge. > > The exceptions list is normally only checked during Lucene's unit > tests, but apps could check it as well. > > > Anyway, I'll fix PQ's code now to continue my work, but if you want to > > reproduce it, it's as easy as adding size = maxSize to initialize() and > run > > TestStressSort. > > > > I don't mind to open an issue and fix it (though I'm not sure what should > > the fix be at the moment, but I'll figure it out), but it will have to > wait, > > so if you know the code and can put a patch together quickly, don't wait > up > > for me :) > > I think maybe the best/simplest fix is to simply sleep for a bit (250 > msec?) on hitting an exception while merging? This way CPU won't be > pegged, you won't suddenly see zillions of exceptions streaming by, > etc. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >