On Sat, Feb 11, 2012 at 11:27:10PM +0100, Nick Wellnhofer wrote: > Thanks for pointing me to RequiredOptionalQuery. It looks very useful. > > I can't model the query to identify the subset directly in Lucy. The > subset is computed by some other code, so I think I'll end up with an > ORQuery with about 100 terms matching a StringType field containing an > external document id.
OK, that sounds like the right way to go. Really big ORQueries can bog down, but 100 terms, all of which are rare, that's not so bad. >>> Is there a better way than to simply retrieve all the results, apply the >>> boost factor manually to the scores and sort the results again? >> >> I hope you don't have to resort to post-search filtering. That's slow to >> begin with and it doesn't scale very well because of the costs of retrieving >> so many documents. You also have to resort to non-idiomatic sorting code >> (using a priority queue rather than the Perl sort() function) if you don't >> want memory usage to balloon. > > It wouldn't be too bad in my use case because the number of results is > limited. But I'm curious what the most scalable solution would look like. I only mean that post-search sorting doesn't scale nearly as well as sorting during the main search -- in terms of CPU cycles, i/o, or memory. Sorting during the main search uses a priority queue, and the sort caches we build at index time are extrememly efficient. Marvin Humphrey
