[
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769860#action_12769860
]
Jake Mannix commented on LUCENE-1997:
-------------------------------------
Mark, you say with the previous numbers, you'd say "-1", but if you look at the
most common use case (top 10), the simpler API is faster in almost all cases,
and in some cases it's 10-20% faster. Top 500, top 1000 are not only just
"not as common", they're probably at the 1% level, or less.
As far as shifting back, API-wise, that really shouldn't be a factor: 2.9
*just* came out, and what, we stick with a slightly *slower* API (for the most
common use case across all Lucene users), which happens to be *more complex*,
and more importantly: just very nonstandard - Comparable is very familiar to
everyone, even if you have to have two forms, one for primitives, one for
Objects - an api which *doesn't* have the whole slew of compare(),
compareBottom(), copy(), setBottom(), value() and setNextReader() has a
tremendous advantage over one which does.
It's "advanced" to implement a custom sort, but it will be *easier* if it's not
complex, and then it doesn't *need* to be "advanced" (shouldn't we be striving
to make there be less APIs which are listed as "advanced", and instead more
features which can *do* complex things but are still listed as things "normal
users" can do).
I think it's *great* precedent to set with users to say, "oops! we found that
this new (just now as of this version) api was unnecessarily clumsy, we're
shifting back to a simpler one which is just like the one you used to have".
Sticking with a worse api because it performs better in only extreme scenarios
because "we already moved on to this new api, shouldn't go back now, don't want
to admit we ever made a mistake!" is what is "ugly".
The main thing to remember is that the entire thinking around making this
different from the old was *only* because it seemed that using a simpler api
would perform much worse than this one, and it does not appear that this is the
case. If that original reasoning turns out to have been incorrect, then the
answer is simple: go with the simpler API *now* before users *do* get used to
using the new one.
If it turns out I'm wrong, and lots of users sort based on field values for the
top 1000 entries often, or that the most recent runs turn out to be flukes and
are not typical performance, only then would I'd change my opinion.
> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch,
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]