I'm relatively new to Lucene-- I think I know the answer to this, but in case I'm wrong I'd like to know.
In our application we are looking for hits in a large corpus of resumes. The documents are rather simple: simply a person id and a text field. The query is something like a "more like this" application. This is working very well right out of the box. However my user would like to input a subset of all the person ids and only return those hits that are among that list. This input list is likely to be many thousands of people. The people in this list won't fall into any obvious categories by which this could be dealt with by an appropriate query to the index, if the people had been tagged at indexing time. It will be essentially a "random" list of person ids. My guess is that my user would really like the scoring to be done only considering that subset of person ids as well but we haven't explicitly discussed it and I'm pretty sure that the scoring is based on information in the entire index and can't be changed on the fly, correct? In any case it seems to me that the "natural" way to only return people who are in the original input list is to simply use Lucene as it is, getting all the hits I need, and then only returning out of the application those on the original input list. Does this seem appropriate? Thanks in advance for any pointers-- Donna L. Gresh Services Research, Mathematical Sciences Department IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donnagresh [EMAIL PROTECTED]