I opened LUCENE-2006. Is there any MultiSearcher related task/alg in contrib/benchmark?
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, October 23, 2009 7:53 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore > performance of multi-PQ vs single-PQ sorting API > > Nice! I like it. Even if its not much faster (havn't checked either), I > can't see it being much slower and its cleaner code. > > I'd be happy to do some quick perf tests when I get a chance, but I'm +1 > on it. > > Uwe Schindler wrote: > > Mark, > > > > when removing may comment (as I now understand the whole > > FieldDocSortedHitQueue), I found the following as a optimization of the > > whole hq: > > > > All FieldDoc values are Compareables (also the score or docid, if they > > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The > code > > of lessThan seems very ineffective, as it has a big switch statement on > the > > SortField type, then casts the value to the underlying numeric type > Object, > > calls Number.xxxValue() & co for it and then compares manually. As > > j.l.Number is itself Comparable, I see no reason to do this. Just call > > compareTo on the Comparable interface and we are happy. The big deal is > that > > it prevents casting and the two method calls xxxValue(), as > Number.compareTo > > works more efficient internally. > > > > The only special cases are String sort, where the Locale may be used and > the > > score sorting which is backwards. But these are two if statements > instead of > > the whole switch. > > > > I had not tested it now for performance, but in my opinion it should be > > faster for MultiSearchers. All tests still pass (because they should). > > > > Attached patch applies to (current) trunk. > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -----Original Message----- > >> From: Mark Miller (JIRA) [mailto:j...@apache.org] > >> Sent: Friday, October 23, 2009 3:33 PM > >> To: java-dev@lucene.apache.org > >> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance > of > >> multi-PQ vs single-PQ sorting API > >> > >> > >> [ https://issues.apache.org/jira/browse/LUCENE- > >> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > >> tabpanel&focusedCommentId=12769221#action_12769221 ] > >> > >> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: > >> --------------------------------------------------------------- > >> > >> bq. but how does this fit together. > >> > >> Thats what Comparable FieldComparator#value is for - fillFields will > grab > >> all those and load up FieldDoc fields - so the custom FieldComparator > is > >> tied into it - it creates Comparable objects that can be compared by > the > >> native compareTos. (the old API did the same thing) > >> > >> {code} > >> /** > >> * Given a queue Entry, creates a corresponding FieldDoc > >> * that contains the values used to sort the given document. > >> * These values are not the raw values out of the index, but the > >> internal > >> * representation of them. This is so the given search hit can be > >> collated by > >> * a MultiSearcher with other search hits. > >> * > >> * @param entry The Entry used to create a FieldDoc > >> * @return The newly created FieldDoc > >> * @see Searchable#search(Weight,Filter,int,Sort) > >> */ > >> FieldDoc fillFields(final Entry entry) { > >> final int n = comparators.length; > >> final Comparable[] fields = new Comparable[n]; > >> for (int i = 0; i < n; ++i) { > >> fields[i] = comparators[i].value(entry.slot); > >> } > >> //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores > >> return new FieldDoc(entry.docID, entry.score, fields); > >> } > >> {code} > >> > >> was (Author: markrmil...@gmail.com): > >> bq. but how does this fit together. > >> > >> Thats what Comparable FieldComparator#value is for - fillFields will > grab > >> all those and load up FieldDoc fields - so the custom FieldComparator > is > >> tied into it - it creates Comparable objects that can be compared by > the > >> native compareTos. > >> > >> {code} > >> /** > >> * Given a queue Entry, creates a corresponding FieldDoc > >> * that contains the values used to sort the given document. > >> * These values are not the raw values out of the index, but the > >> internal > >> * representation of them. This is so the given search hit can be > >> collated by > >> * a MultiSearcher with other search hits. > >> * > >> * @param entry The Entry used to create a FieldDoc > >> * @return The newly created FieldDoc > >> * @see Searchable#search(Weight,Filter,int,Sort) > >> */ > >> FieldDoc fillFields(final Entry entry) { > >> final int n = comparators.length; > >> final Comparable[] fields = new Comparable[n]; > >> for (int i = 0; i < n; ++i) { > >> fields[i] = comparators[i].value(entry.slot); > >> } > >> //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores > >> return new FieldDoc(entry.docID, entry.score, fields); > >> } > >> {code} > >> > >> > >>> Explore performance of multi-PQ vs single-PQ sorting API > >>> -------------------------------------------------------- > >>> > >>> Key: LUCENE-1997 > >>> URL: https://issues.apache.org/jira/browse/LUCENE-1997 > >>> Project: Lucene - Java > >>> Issue Type: Improvement > >>> Components: Search > >>> Affects Versions: 2.9 > >>> Reporter: Michael McCandless > >>> Assignee: Michael McCandless > >>> Attachments: LUCENE-1997.patch, LUCENE-1997.patch > >>> > >>> > >>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > >>> where a simpler (non-segment-based) comparator API is proposed that > >>> gathers results into multiple PQs (one per segment) and then merges > >>> them in the end. > >>> I started from John's multi-PQ code and worked it into > >>> contrib/benchmark so that we could run perf tests. Then I generified > >>> the Python script I use for running search benchmarks (in > >>> contrib/benchmark/sortBench.py). > >>> The script first creates indexes with 1M docs (based on > >>> SortableSingleDocSource, and based on wikipedia, if available). Then > >>> it runs various combinations: > >>> * Index with 20 balanced segments vs index with the "normal" log > >>> segment size > >>> * Queries with different numbers of hits (only for wikipedia index) > >>> * Different top N > >>> * Different sorts (by title, for wikipedia, and by random string, > >>> random int, and country for the random index) > >>> For each test, 7 search rounds are run and the best QPS is kept. The > >>> script runs singlePQ then multiPQ, and records the resulting best QPS > >>> for each and produces table (in Jira format) as output. > >>> > >> -- > >> This message is automatically generated by JIRA. > >> - > >> You can reply to this email to add a comment to the issue online. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > > > > > > ------------------------------------------------------------------------ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org