Daniel Naber wrote:
After fixing this I can reproduce the problem with a local index that
contains about 220.000 documents (700MB). Fetching the first document
takes for example 30ms, fetching the last one takes >100ms. Of course I
tested this with a query that returns many results (about 50.000).
Actually it happens even with the default sorting, no need to sort by some
specific field.
In part this is due to the fact that Hits first searches for the
top-scoring 100 documents. Then, if you ask for a hit after that, it
must re-query. In part this is also due to the fact that maintaining a
queue of the top 50k hits is more expensive than maintaining a queue of
the top 100 hits, so the second query is slower. And in part this could
be caused by other things, such as that the highest ranking document
might tend to be cached and not require disk io.
One could perform profiling to determine which is the largest factor.
Of these, only the first is really fixable: if you know you'll need hit
50k then you could tell this to Hits and have it perform only a single
query. But the algorithmic cost of keeping the queue of the top 50k is
the same as collecting all the hits and sorting them. So, in part,
getting hits 49,990 through 50,000 is inherently slower than getting
hits 0-10. We can minimize that, but not eliminate it.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]