Hi all,
I have the following issue (I am giving a quantified example so we can
talk more concretely)
My documents have an docId field, stored as a keyword field.
I am executing searches that return between 2000 and 10000 documents and
sorting the results by relevance (or sometimes alphabetically).
In every query, I need to discard some of the results based on their
docId. I have a list of the docIds that need to be discarded in an
array. The size of the list is usually about 100.
I generally perform paging, so I don't need to display all the results,
but only those of the current page (e.g. results 100 to 120)
Currently, after getting the Hits object from the Searcher, I loop over
the documents, retrieve their docId, and see if it is in the discard
list. If not, I put the document in a validResult collection. When I
have read enough valid results to be able to show the current page, I
stop. E.g., in the above example, I would read until I had 120 valid
results.
The problem with this implementation is that I have to read the docId
for the results that precede the current page, in order to determine if
they are in the invalid list. So, when showing the pages near the last
page, I have to read the entire result list. I don't like this because
this means that the computation required to display a page is not
constant but depends on the page's position. Anyway, what do the experts
think? Is this implementation prohibitively expensive?
(As a sidenote, when calling hits.doc(i), does Lucene retrieve the whole
document, or just a pointer to it, and retrieves the data when doing
hits.doc(i).getField...?)
An alternative would be to extend the query to exclude the ids in the
discard list. How would adding 100 exclusion clauses to the query impact
the query's performance? Are there any studies on search speed in
relation to the number of query clauses?
Is there another way to do handle this issue?
Many thanks in advance,
Marios Skounakis
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]