Take a look at the org.apache.lucene.search.function package in SVN. It provides an API that allows you to define "function" classes that can compute a score for each document using whatever means you want. The overall FunctionQuery can then be wrapped in a BooleanQuery along with whateer other search critera you have.
5 basic functions have been included that can be composed in all sorts of interesting ways to compute scores based on document values for a particular field (or the relative ordinal positions in the FieldCache for that field) : Date: Tue, 24 Jan 2006 17:42:06 -0000 : From: Nick Vincent <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Sorting by calculated custom score at search time : : Hi, : : I am trying to find a way to create scores with a custom formula based : on the initial score from Lucene and field values from each document, : e.g. for each document: : : finalScore = searchScore * (popularity) * (userRating) : : The customer requires this functionality as I have to replace an : existing system that works like this. User rating and popularity are : already available and will be stored in Lucene. I've looked through LIA : and the approaches there don't seem to fit the requirement: : : 5.1.6 Sorting by multiple fields: only sorts by one field, then the : next, I need to combine the scores : 6.1 Using a custom sort method: does not take into account the : document's original score : : >From an earlier thread discussing a calculated score based on the hit : score and the age of document I gather that TSS regenerate their indexes : to alter the document boost based on date. I need to be able to sort by : either relevance or "popularity rated relevance" depending on user : input, so I don't think adding a precalculated document boost at index : time is an option. : : In the worst case scenario I'll need to iterate through the hits and : then sort them in memory myself, but I'm looking to be indexing around : 500,000 documents, and in this particular application there are a lot of : common keywords, so a large number of hits for a basic query is common. : I'm trying to avoid this as it's an untidy solution which is likely to : be (relatively) slow. : : I notice Erik has commented that "I've not come across a really clean : way to do this sort of age-based : boosting other than how TSS does it". I was wondering if anyone has any : experience with dirtier approaches they could share with me? : : Any help is really appreciated, : : Thanks, : : Nick : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]