[ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless reopened LUCENE-3054: ---------------------------------------- Reopening so we can discuss things further...: QuickSort is dangerous! Yet, it's definitely faster than MergeSort for some cases (~20% faster when sorting terms for writing segment, in quick test I ran on Wikipedia content). So the core issue is we should not use QS when there's a risk of any ties, because in that case it can run really slowly or hit infinite recursion. And we (well, Otis; thank you!) found one such place today (where MultiPhraseQuery sorts its terms) where we could have many ties and thus run very slowly / hit stack overflow. I appreciate the motivation for the "safety net", but, it makes me nervous... because, say we had done this a few months back... then Otis likely would not have reported the issue? Ie, the MultiPhraseQuery would run slowly... which could evade detection (people may just think it's slow). I prefer brittle fails over silent slowdowns because the brittle fail gets your attention and you get a real fix in. Silent slowdowns evade detection. Sort of like the difference between a virus and spyware... Also, what's preventing us from accidentally using QS somewhere in the future, where we shouldn't? What's going to catch us? Robert's first patch would catch this and protect us going forward? Or, maybe we could strengthen that approach and "assert cmp != 0" inside QS (ie, no ties are allowed to be passed to QS)? Though, using asserts only is risky, because it could be the comparator may return 0, but it's just that none of our test cases tickled it. Maybe instead we could do this in a type-safe way: make a new NoTiesComparator whose compare method can only return LESS_THAN or GREATER_THAN? And then QS would require NoTiesComparator. Could that work? > SorterTemplate.quickSort stack overflows on broken comparators that produce > only few disticnt values in large arrays > -------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-3054 > URL: https://issues.apache.org/jira/browse/LUCENE-3054 > Project: Lucene - Java > Issue Type: Task > Affects Versions: 3.1 > Reporter: Robert Muir > Assignee: Uwe Schindler > Priority: Critical > Fix For: 3.1.1, 3.2, 4.0 > > Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, > LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, > LUCENE-3054.patch > > > Looking at Otis's sort problem on the mailing list, he said: > {noformat} > * looked for other places where this call is made - found it in > MultiPhraseQuery$MultiPhraseWeight and changed that call from > ArrayUtil.quickSort to ArrayUtil.mergeSort > * now we no longer see SorterTemplate.quickSort in deep recursion when we do a > thread dump > {noformat} > I thought this was interesting because PostingsAndFreq's comparator > looks like it needs a tiebreaker. > I think in our sorts we should add some asserts to try to catch some of these > broken comparators. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org