restricting hits to a subset of "id"s

Donna L Gresh Wed, 30 May 2007 12:05:19 -0700

I'm relatively new to Lucene-- I think I know the answer to this, but in 
case I'm wrong I'd like to know.


In our application we are looking for hits in a large corpus of resumes. 
The documents are rather simple:
simply a person id and a text field.  The query is something like 
a "more like this" application. This is working very well right out of the 
box. However my user would
like to input a subset of all the person ids and only return those hits 
that are among that list. This input 
list is likely to be many thousands of people. The people in this list 
won't fall into any obvious
categories by which this could be dealt with by an appropriate query to 
the index, if the people had
been tagged at indexing time. It will be essentially a "random" list of 
person ids.

My guess is that my user would really like the scoring to be done only 
considering that subset of person ids as
well but we haven't explicitly discussed it and I'm pretty sure that the 
scoring is based on information in
the entire index and can't be changed on the fly, correct?

In any case it seems to me that the "natural" way to only return people 
who are in the original input list
is to simply use Lucene as it is, getting all the hits I need, and then 
only returning out of the application those on 
the original input list. Does this seem appropriate?
Thanks in advance for any pointers--

Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]

restricting hits to a subset of "id"s

Reply via email to