[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Jake Mannix (JIRA) Wed, 28 Oct 2009 07:27:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770945#action_12770945
 ]


Jake Mannix commented on LUCENE-1997:
-------------------------------------

bq. search segment 2 into queue B (with possible short circuiting by the 
smallest value in queueA)

Well, we're not doing the short circuit trick on multiPQ right now, are we?  It 
would certainly speed things up, but requires the API have the convert() method 
available, which was the big savings on the API side to multiPQ.  If it was 
available, I think multiPQ (either with N or 2 queues) would perform *strictly* 
better than singlePQ, but I didn't suggest this because it seems to negate the 
cleanliness of the API.

One thing John mentioned offhand is that perhaps the convert() method could be 
optional?  If you don't implement it, you don't get to short-circuit using 
knowledge of previous segments, but if you do, you get maximum performance in 
the cases where multiPQ performs worse (mid-range hitCount, high 
numResultsToReturn, and in the numeric sorting case).

I think maybe combining this idea with 2 queues could be the best of all 
worlds, with best overall speed, only twice the memory of singlePQ, and the 
simplest API with the addition of one new *optional* method?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to