Hi guys: I did some Big O math a few times and reached the same conclusion Jake had.
I was not sure about the code tuning opportunities we could have done with the MergeAtTheEnd method as Yonik mentioned and the internal behavior with PQ Mike suggested, so I went ahead and implemented the algorithm and compared against the current Lucene 2.9 implementation: set up: index size: 999992 docs segment count: 8 sorting on 1 field, no score, e.g. comparing against "OneComparatorNonScoringCollector" class. TermQuery used on field with only 2 values (even/odd), returning 499992 docs. I broke the tests into two parts, numHits = 20 and numHits = 50, for each, I ran 100 iterations, took the time from 6 - 100 (to avoid field cache load time) and take the average. Did it 3 times and took that average. Here are the numbers (times are measured in nanoseconds): numHits = 50: Lucene 2.9/OneComparatorNonScoringCollector: num string compares: 251 num conversions: 34 time: 15272049 cpu time: 161105 num inserts into PQ: 211 My test sort algorithm: num string compares: 51 num conversions: 0 time: 14943172 cpu time: 153722 num inserts into PQ: 1500 numHits = 100: Lucene 2.9/OneComparatorNonScoringCollector: num string compares: 2285 num conversions: 172 time: 17703866 cpu time: 187073 num inserts into PQ: 982 My test sort algorithm: num string compares: 210 num conversions: 0 time: 15823473 cpu time: 160737 num inserts into PQ: 6011 Conclusion: My measurement is consistent with mine and Jake's Big O analysis, where the dominating factor is the same between both algorithms. And in terms of timing/cpu, they are very similar with the second algorithm consistently perform a slightly better than Lucene 2.9. Mike is absolutely correct about the number of PQ insertions, but that win seems to be shadowed by the cost of extra string compares and conversions. If you look at the "growth" between numHits=20 and 100, the time gap increases, e.g. as the PQ gets larger (case where user navigates to a higher page of the search result) the num string compares increase faster than the increase of the size of PQ. There is an assumption that the conversion cost is low, which may not be true for custom sort extensions, and could easily nominate the entire query time. (more analysis below) Given this analysis, seems like at the least the second algorithm is not worse in terms of performance, and I would argue it is actually better. With the current API which ties into this algorithm in Lucene 2.9 (FieldComparator), there is actually a subtle behavior change, such that, to implement a custom sort, user would have to be able to do this conversion, which involves a reverse mapping from value to ordinals. This is a change on the contract for writing custom sort comparison, where before, ordinal->value mapping can be 1 to many, and now, with the requirement for this reverse mapping, it is much more difficult and expensive to do. If you agree, I think it is worthwhile to open a jira ticket for this while this FieldComparator API is still fresh. And I'd be happy to provide my implementation, test code and my index. Thanks -John On Thu, Oct 15, 2009 at 10:06 AM, Jake Mannix <jake.man...@gmail.com> wrote: > On Thu, Oct 15, 2009 at 9:12 AM, Yonik Seeley > <yo...@lucidimagination.com>wrote: > >> On Thu, Oct 15, 2009 at 11:53 AM, Yonik Seeley >> <yo...@lucidimagination.com> wrote: >> > And it seems like a PQ per segment simply delays many of the slow >> > lookups until the end where the PQs must be merged. >> >> Actually, I'm wrong about that part - one can simply merge on >> values... there will be lots of string comparisons (and a new merge >> queue) but no binary searches. >> > > The number of string comparisons in the final merge of queues is > O(K log(M) ) if you do as you're describing (have basically a PQ of sorted > lists, the same way BooleanScorer works), where K is going to be roughly > what, 10, 100 (numResultsToReturn), and M is 10-50? This is like a couple > thousand string compares, tops, and independent of the overall size of the > segments - it's measured in microseconds of CPU time. > > -jake >