That's a good question. I don't have a very satisfying answer other than to
say we saw some improvements, and I would have to dig more now to say why.
It may be that in our system we have some additional per-document costs in
a custom collector that were saved by this reduction, and that's why we saw
a latency reduction. However I also note these luceneutil benchmarks
results:

** This is with -topN=500, and a searcher threadpool=8
Report after iter 19:
                    TaskQPS baseline      StdDevQPS candidate
StdDev                Pct diff
   HighTermDayOfYearSort      391.23      (2.4%)      627.92      (4.8%)
60.5% (  52% -   69%)

** This is with -topN=500, and no searcher threadpool
Report after iter 19:
                    TaskQPS baseline      StdDevQPS candidate
StdDev                Pct diff
   HighTermDayOfYearSort      239.52      (3.3%)      232.70      (3.0%)
-2.8% (  -8% -    3%)

which show QPS improvement from using threads even in the baseline case
(and then an additional improvement from prorating).

On Fri, Feb 1, 2019 at 11:28 AM Adrien Grand <jpou...@gmail.com> wrote:

> Something makes me curious: queries that can leverage sorted indices
> should be _very_ fast, for instance in your case they only need to
> look at 500 documents per segment at most (less in practice since we
> stop collecting as soon as a non-competitive hit is found), so why do
> you need to parallelize query execution?
>
> On Fri, Feb 1, 2019 at 3:18 PM Michael Sokolov <msoko...@gmail.com> wrote:
> >
> > I want to propose an optimization to early termination that gives nice
> > speedups for large result sets when searching with multiple threads at
> the
> > cost of a small (controllable) probability of collecting documents out of
> > order: in benchmarks I see +60-70% QPS for tasks like
> HighTermDayOfYearSort
> > when topN=500, using 8 threads to search, and in our production system
> this
> > optimization cut the latency of our slowest queries substantially.
> >
> > In a multi-phase ranking scenario a typical pattern is to retrieve a
> > largish number of matches in a first pass using indexed sort, followed
> by a
> > second pass that re-ranks and selects a smaller top K, using a more
> > expensive ranking. N is chosen to provide sufficient probabililty of
> > finding the desired top K across the whole index, given that the index
> sort
> > is some approximation to the desired sort. When ranking by indexed sort,
> as
> > in TopFieldCollector, we can now early-terminate when a sufficient number
> > of matches have been found so that we only need retrieve N documents from
> > each segment. In single-threaded mode we can check against
> > minCompetitiveScore and terminate collection for each segment
> > appropriately, but when using multiple threads to search concurrently
> there
> > is no such coordination and we end up collecting N documents *per
> segment*,
> > which are then merged down to N.
> >
> > We do not need to collect so many documents though. For any given
> segment,
> > let p=(leaf.maxDoc/topLevel.maxDoc) be the proportion of documents in
> that
> > segment. Assuming that documents are distributed randomly among segments,
> > we can expect that on average we will find p*N of the top N documents in
> > the given segment. If we only collect p*N documents, we will sometimes
> miss
> > some documents that we should have collected, collecting some
> > less-competitive documents from one segment, while not collecting all the
> > competitive documents from another. But how many should we collect in
> order
> > to make this occur only very rarely?
> >
> > The worst case is that all top N documents occur in a single segment. For
> > even small values of N and small numbers of segments S, this probability
> is
> > vanishingly small (N=10, S=10) -> 10^(1-N) = 1/10^9. More generally, this
> > distribution of documents among segments is a multinomial distribution,
> and
> > the variance of the number of documents in a single segment is that of a
> > binomial distribution. The binomial variance in this case (p=probability
> of
> > document in the segment, N number of documents) is p*(1-p)*N; we can use
> > this to compute the number of documents to collect per leaf in order to
> > bound the probability of a ranking error. I'm using a cutoff of 3
> standard
> > deviations, i.e.  collecting p*N + 3*(p*(1-p)*N)^1/2 documents for each
> > segment. For N=500, p=0.2, we can collect 67 documents instead of 500 at
> > the cost of an error that occurs < 3/1000.
> >
> > Also note that the kind of errors we make are typically benign. In most
> > cases we will return the correct top N-1 documents, but instead of
> > returning the Nth-ranked document in position N, we return the N+1st.
> >
> > Implementing this in Lucene requires a small patch to TopFieldCollector
> to
> > introduce a leafHitsThreshold comparable to the existing
> > totalHitsThreshold. Given the possibility of error, it might be good to
> > have a way to disable this, but my inclination would be to enable it
> > whenever approximate counts are enabled (ie by default), and disable when
> > totalHitsThreshold is MAX_VALUE.
> >
> > What do you think? Shall I open an issue?
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to