Vitality / Robert
I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes.
Unless I am mistaken, the Lucene library's pagination mechanism, makes
the assumption that you will cache the scoredocs for the entire result
set. This is not practical when you have a result set that exceeds 60M.
As stated earlier, in any case, it is the first query that is slow.
We do open index readers.. since we are using NRT search. Since
documents are being added to the indexes on a continuous basis. When the
user clicks on the Search button, the user will expect to see the latest
result set. With regards to NRT search, my understanding is that we do
need to open the index readers on each search operation to see the
latest changes.
Thus, on each search, we combine the indexreaders into a multireader,
and open each reader based their corresponding writer.
protected IndexReader initIndexReader() {
List<IndexReader> readers = new LinkedList<>();
for (Writer writer : writers) {
readers.add(DirectoryReader.open(writer, true);
}
return MultiReader(readers,true);
}
Thank you for your ideas/suggestions.
Regards
Jamie
On 2014/06/03, 12:29 PM, Vitaly Funstein wrote:
Jamie,
What if you were to forget for a moment the whole pagination idea, and
always capped your search at 1000 results for testing purposes only? This
is just to try and pinpoint the bottleneck here; if, regardless of the
query parameters, the search latency stays roughly the same and well below
5 min, you now have the answer - the problem is your naive implementation
of pagination which results in snowballing result numbers and search times,
the closer you get to the end of the results range. Otherwise, I would
focus on your query and filter next.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org