On Mon, Apr 23, 2012 at 1:25 PM, Jong Kim <jong.luc...@gmail.com> wrote: > Thanks for the reply. > > Our metadata is not stored in a single field, but is rather a collection of > fields. So, it requires a boolean search that spans multiple fields. My > understanding is that it is not possible to iterate over the matching > documents efficiently using termDocs() when the search involves multiple > terms and/or multiple fields, right? > > /Jong
You can do this by defining your own hits Collector which simply pulls the matching ID out of each result. Since searching the second index returns less results, you could do something like this: Two indexes: LightWeight - stores metadata fields and document ID HeavyWeight - stores static data and document ID Search query: 1. Metadata portion: query LightWeight and retrieve all matching IDs (NOT Lucene IDs, but your own stored document ID) in a gnu.trove TIntSet Now some queries won't even hit the second index, and you have your full match. If you need to match against the 2nd index as well: 2. Pass in the TIntSet as an argument to another Collector. 3. For each match in the HeavyWeight index, if it is also in the TIntSet, add it to the final TIntSet result set. Otherwise ignore it. 4. After the collector has been visited by each match, the final result set is your hits. You now have the set of document IDs for the complete match. Using primitives and lightweight objects, this isn't much worse than letting Lucene do the collection. Of course, this approach only works if the intersection between metadata and big data is an AND relationship. If you need other logic, step 3 above obviously changes. Another caveat is that if you are relying on Lucene to store and return the full document for each query, this approach isn't the best for fetching information out of Lucene. We use a standard relational database for storing our data, we use Lucene to query for sets of document IDs, and then we fetch the remaining document fields from our DB (or in some cases, some information lives on S3, etc.). --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org