On 11/02/2012 22:18, Marvin Humphrey wrote:
On Sat, Feb 11, 2012 at 10:03:37PM +0100, Nick Wellnhofer wrote:
What's the best way to apply a boost factor dynamically to a (small)
subset of documents?

I would suggest using a RequiredOptionalQuery.  Have the logical results
depend on the required_query and boost using the optional_query.

     my $parsed_query = $query_parser->parse($user_query_string);
     my $user_id_boost_query = Lucy::Search::TermQuery->new(
         field =>  'user_id',
         term  =>  $user_id,
     );
     $user_id_boost_query->set_boost($arbitrary_boost);
     my $req_opt_query = Lucy::Search::RequiredOptionalQuery->new(
         required_query =>  $parsed_query,
         optional_query =>  $user_id_boost_query,
     );

If the query to identify the subset of documents is very expensive, you might
look into using LucyX::Search::Filter to cache the results (but note that
Filter does not cache in a clustered environment).

Thanks for pointing me to RequiredOptionalQuery. It looks very useful.

I can't model the query to identify the subset directly in Lucy. The subset is computed by some other code, so I think I'll end up with an ORQuery with about 100 terms matching a StringType field containing an external document id.

Is there a better way than to simply retrieve all the results, apply the
boost factor manually to the scores and sort the results again?

I hope you don't have to resort to post-search filtering.  That's slow to
begin with and it doesn't scale very well because of the costs of retrieving
so many documents.  You also have to resort to non-idiomatic sorting code
(using a priority queue rather than the Perl sort() function) if you don't
want memory usage to balloon.

It wouldn't be too bad in my use case because the number of results is limited. But I'm curious what the most scalable solution would look like.

Nick

Reply via email to