[ http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444411 ] Yonik Seeley commented on LUCENE-693: -------------------------------------
> Well, I'm seeing a good 7% increase over the trunk version. Yay! Now only if I could get my random synthetic tests to show an improvement too... Were you testing with -server? My -client showed a speedup and -server showed a slowdown. I think the difference is on *which* scorers I'm skipping on, even though I'm always skipping to the highest doc yet seen. Skipping on denser scorers will be a waste of time, and if the list is sorted one is more likely to be skipping on the sparse scorers. My code is optimal when the density of the scorers is similar. Think of the case of two sparse scorers and a dense scorer... you really want to be skipping on the two sparse scorers until they happen to agree. Until they agree, skipping on the dense scorer is a waste. My code round robins and throws the dense scorer into the mix. The question is, what are the real world usecases like, and what is important to speed up. I'd argue that the case of all dense scorers, while more rare, is more important (sparse scorers will cause the queries to be faster anyway). > Do the test cases try queries with non-existent terms? They will.... I was able to reproduce by earlier bug with the new TestScorerPerf.testConjunctions() included in the last patch. > ConjunctionScorer - more tuneup > ------------------------------- > > Key: LUCENE-693 > URL: http://issues.apache.org/jira/browse/LUCENE-693 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 2.1 > Environment: Windows Server 2003 x64, Java 1.6, pretty large index > Reporter: Peter Keegan > Attachments: conjunction.patch, conjunction.patch > > > (See also: #LUCENE-443) > I did some profile testing with the new ConjuctionScorer in 2.1 and > discovered a new bottleneck in ConjunctionScorer.sortScorers. The > java.utils.Arrays.sort method is cloning the Scorers array on every sort, > which is quite expensive on large indexes because of the size of the 'norms' > array within, and isn't necessary. > Here is one possible solution: > private void sortScorers() { > // squeeze the array down for the sort > // if (length != scorers.length) { > // Scorer[] temps = new Scorer[length]; > // System.arraycopy(scorers, 0, temps, 0, length); > // scorers = temps; > // } > insertionSort( scorers,length ); > // note that this comparator is not consistent with equals! > // Arrays.sort(scorers, new Comparator() { // sort the array > // public int compare(Object o1, Object o2) { > // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); > // } > // }); > > first = 0; > last = length - 1; > } > private void insertionSort( Scorer[] scores, int len) > { > for (int i=0; i<len; i++) { > for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- ) { > swap (scores, j, j-1); > } > } > return; > } > private void swap(Object[] x, int a, int b) { > Object t = x[a]; > x[a] = x[b]; > x[b] = t; > } > > The squeezing of the array is no longer needed. > We also initialized the Scorers array to 8 (instead of 2) to avoid having to > grow the array for common queries, although this probably has less > performance impact. > This change added about 3% to query throughput in my testing. > Peter -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]