RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Uwe Schindler Fri, 23 Oct 2009 13:03:19 -0700

I opened LUCENE-2006.

Is there any MultiSearcher related task/alg in contrib/benchmark?


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

> -----Original Message-----
> From: Mark Miller [mailto:[email protected]]
> Sent: Friday, October 23, 2009 7:53 PM
> To: [email protected]
> Subject: Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore
> performance of multi-PQ vs single-PQ sorting API
> 
> Nice! I like it. Even if its not much faster (havn't checked either), I
> can't see it being much slower and its cleaner code.
> 
> I'd be happy to do some quick perf tests when I get a chance, but I'm +1
> on it.
> 
> Uwe Schindler wrote:
> > Mark,
> >
> > when removing may comment (as I now understand the whole
> > FieldDocSortedHitQueue), I found the following as a optimization of the
> > whole hq:
> >
> > All FieldDoc values are Compareables (also the score or docid, if they
> > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The
> code
> > of lessThan seems very ineffective, as it has a big switch statement on
> the
> > SortField type, then casts the value to the underlying numeric type
> Object,
> > calls Number.xxxValue() & co for it and then compares manually. As
> > j.l.Number is itself Comparable, I see no reason to do this. Just call
> > compareTo on the Comparable interface and we are happy. The big deal is
> that
> > it prevents casting and the two method calls xxxValue(), as
> Number.compareTo
> > works more efficient internally.
> >
> > The only special cases are String sort, where the Locale may be used and
> the
> > score sorting which is backwards. But these are two if statements
> instead of
> > the whole switch.
> >
> > I had not tested it now for performance, but in my opinion it should be
> > faster for MultiSearchers. All tests still pass (because they should).
> >
> > Attached patch applies to (current) trunk.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [email protected]
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller (JIRA) [mailto:[email protected]]
> >> Sent: Friday, October 23, 2009 3:33 PM
> >> To: [email protected]
> >> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance
> of
> >> multi-PQ vs single-PQ sorting API
> >>
> >>
> >>     [ https://issues.apache.org/jira/browse/LUCENE-
> >> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >> tabpanel&focusedCommentId=12769221#action_12769221 ]
> >>
> >> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM:
> >> ---------------------------------------------------------------
> >>
> >> bq. but how does this fit together.
> >>
> >> Thats what Comparable FieldComparator#value is for - fillFields will
> grab
> >> all those and load up FieldDoc fields - so the custom FieldComparator
> is
> >> tied into it - it creates Comparable objects that can be compared by
> the
> >> native compareTos. (the old API did the same thing)
> >>
> >> {code}
> >>   /**
> >>    * Given a queue Entry, creates a corresponding FieldDoc
> >>    * that contains the values used to sort the given document.
> >>    * These values are not the raw values out of the index, but the
> >> internal
> >>    * representation of them. This is so the given search hit can be
> >> collated by
> >>    * a MultiSearcher with other search hits.
> >>    *
> >>    * @param entry The Entry used to create a FieldDoc
> >>    * @return The newly created FieldDoc
> >>    * @see Searchable#search(Weight,Filter,int,Sort)
> >>    */
> >>   FieldDoc fillFields(final Entry entry) {
> >>     final int n = comparators.length;
> >>     final Comparable[] fields = new Comparable[n];
> >>     for (int i = 0; i < n; ++i) {
> >>       fields[i] = comparators[i].value(entry.slot);
> >>     }
> >>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
> >>     return new FieldDoc(entry.docID, entry.score, fields);
> >>   }
> >> {code}
> >>
> >>       was (Author: [email protected]):
> >>     bq. but how does this fit together.
> >>
> >> Thats what Comparable FieldComparator#value is for - fillFields will
> grab
> >> all those and load up FieldDoc fields - so the custom FieldComparator
> is
> >> tied into it - it creates Comparable objects that can be compared by
> the
> >> native compareTos.
> >>
> >> {code}
> >>   /**
> >>    * Given a queue Entry, creates a corresponding FieldDoc
> >>    * that contains the values used to sort the given document.
> >>    * These values are not the raw values out of the index, but the
> >> internal
> >>    * representation of them. This is so the given search hit can be
> >> collated by
> >>    * a MultiSearcher with other search hits.
> >>    *
> >>    * @param entry The Entry used to create a FieldDoc
> >>    * @return The newly created FieldDoc
> >>    * @see Searchable#search(Weight,Filter,int,Sort)
> >>    */
> >>   FieldDoc fillFields(final Entry entry) {
> >>     final int n = comparators.length;
> >>     final Comparable[] fields = new Comparable[n];
> >>     for (int i = 0; i < n; ++i) {
> >>       fields[i] = comparators[i].value(entry.slot);
> >>     }
> >>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
> >>     return new FieldDoc(entry.docID, entry.score, fields);
> >>   }
> >> {code}
> >>
> >>
> >>> Explore performance of multi-PQ vs single-PQ sorting API
> >>> --------------------------------------------------------
> >>>
> >>>                 Key: LUCENE-1997
> >>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
> >>>             Project: Lucene - Java
> >>>          Issue Type: Improvement
> >>>          Components: Search
> >>>    Affects Versions: 2.9
> >>>            Reporter: Michael McCandless
> >>>            Assignee: Michael McCandless
> >>>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
> >>>
> >>>
> >>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> >>> where a simpler (non-segment-based) comparator API is proposed that
> >>> gathers results into multiple PQs (one per segment) and then merges
> >>> them in the end.
> >>> I started from John's multi-PQ code and worked it into
> >>> contrib/benchmark so that we could run perf tests.  Then I generified
> >>> the Python script I use for running search benchmarks (in
> >>> contrib/benchmark/sortBench.py).
> >>> The script first creates indexes with 1M docs (based on
> >>> SortableSingleDocSource, and based on wikipedia, if available).  Then
> >>> it runs various combinations:
> >>>   * Index with 20 balanced segments vs index with the "normal" log
> >>>     segment size
> >>>   * Queries with different numbers of hits (only for wikipedia index)
> >>>   * Different top N
> >>>   * Different sorts (by title, for wikipedia, and by random string,
> >>>     random int, and country for the random index)
> >>> For each test, 7 search rounds are run and the best QPS is kept.  The
> >>> script runs singlePQ then multiPQ, and records the resulting best QPS
> >>> for each and produces table (in Jira format) as output.
> >>>
> >> --
> >> This message is automatically generated by JIRA.
> >> -
> >> You can reply to this email to add a comment to the issue online.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> >
> > ------------------------------------------------------------------------
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to