I think this sounds like a great idea, too.
It would be nice to modify the PhraseScorer and TermScorer to enable the
strict ordering. I had to have the score() methods of these class
essentially ignore the base lucene scoring and just use the decoded norm
from Similarity.
To enable this, Similarity could have a method like:
float applyNorm(float baseScore)
which could optionally ignore baseScore and modify the scorer classes to do:
score = applyNorm(score)
instead of the
score *= Similarity.decodeNorms()
I'd be happy to contribute in this area if it would be helpful.
DaveB
Otis Gospodnetic wrote:
>This sounds good to me, as it would lead us to pluggable similarity
>computation...mmmm.
>I can refactor some of this tonight.
>
>Otis
>
>
>--- Doug Cutting <[EMAIL PROTECTED]> wrote:
>
>
>>This looks like a good approach. When I get a chance, I'd like to
>>make
>>Similarity an interface or an abstract class, whose default
>>implementation would do what the current class does, but whose
>>methods
>>can be overridden. Then I'd add methods like:
>>
>> public static void Similarity.setDefaultSimilarity(Similarity
>>sim);
>> public void IndexWriter.setSimilarity(Similarity sim);
>> public void Searcher.setSimilarity(Similarity sim);
>>
>>So to override Similarity methods you'd define a subclass of the
>>standard implementation, then either install yours globally via
>>setDefaultSimilarity, or set it in your IndexWriter before adding
>>documents and in your Searcher before searching. Does that sound
>>reasonable?
>>
>>This would let you do what you describe below without changing
>>Lucene's
>>sources. However I'm very short on time right now and don't know how
>>
>>soon I'll get to this.
>>
>>Doug
>>
>>David Birtwell wrote:
>>
>>
>>>Hi Dmitry,
>>>
>>>I was faced with a similar problem. We wanted to have a numeric
>>>
>>>
>>rank
>>
>>
>>>field in each document influence the order in which the documents
>>>
>>>
>>were
>>
>>
>>>returned by lucene. While investigating a solution for this, I
>>>
>>>
>>wanted
>>
>>
>>>to see if I could implement strict sorting based on this numeric
>>>
>>>
>>value.
>>
>>
>>>I was able to accomplish this using document boosting, but not
>>>
>>>
>>without
>>
>>
>>>modifying the lucene source. Our "ranking" field is an integer
>>>
>>>
>>value
>>
>>
>>>from one to one hundred. I'm not sure if this will help you, but
>>>
>>>
>>I'll
>>
>>
>>>include a summary of what I did.
>>>
>>>In DocumentWriter remove the normalization by field length:
>>> float norm = fieldBoosts[n] *
>>>Similarity.normalizeLength(fieldLengths[n]);
>>>to
>>> float norm = fieldBoosts[n];
>>>
>>>In TermScorer and PhraseScorer, modify the score() method to ignore
>>>
>>>
>>the
>>
>>
>>>lucene base score:
>>> score *= Similarity.decodeNorm(norms[d]);
>>>to
>>> score = Similarity.decodeNorm(norms[d]);
>>>
>>>In Similarity.java, make byteToFloat() public.
>>>
>>>At index time, use Similarity.byteToFloat() to determine your boost
>>>
>>>
>>>value as in the following pseudocode:
>>> Document d = new Document();
>>> ... add your fields ...
>>> int rank = d.getField("RANK"); (range of rank can be 0 to 255)
>>> float sortVal = Similarity.byteToFloat(rank)
>>> d.setBoost(sortVal)
>>>
>>>If you'd like the reasoning behind any or all of these items, let
>>>
>>>
>>me know.
>>
>>
>>>DaveB
>>>
>>>
>>>
>>>Dmitry Serebrennikov wrote:
>>>
>>>
>>>
>>>>Greetings Everyone,
>>>>
>>>>I'm thinking of trying to build something that manipulates a query
>>>>
>>>>
>>>>score in order to achieve a sort order other then the default
>>>>relevance sort. The idea is to create a new type of query:
>>>>SortingQuery( Query query, String sortByField )
>>>>
>>>>It would run the sub-query and return results in an order of the
>>>>values found in the "sortByField" for those documents. Now, I've
>>>>looked at all of the sorting discussion prior to this, and the
>>>>
>>>>
>>best
>>
>>
>>>>approach (recommended by Doug among others) is to provide some
>>>>
>>>>
>>sort of
>>
>>
>>>>a fast access to the field values inside the HitCollector. Reading
>>>>
>>>>
>>>>documents at search time is too slow, so people access the data
>>>>elsewhere or build an in-memory index of that data (such as is
>>>>
>>>>
>>done in
>>
>>
>>>>the SearchBean's SortField).
>>>>
>>>>My idea is different. I want to try to do the following:
>>>>- compose a query that consists of the original sub-query followed
>>>>
>>>>
>>by
>>
>>
>>>>a special "sorting query"
>>>>- "boost" the score of the original sub-query to 0
>>>>- compute the score of the sorting query such that it would
>>>>
>>>>
>>reflect
>>
>>
>>>>the desired sort order
>>>>
>>>>Has anyone tried to do something like this?
>>>>Would this work?
>>>>Is this worth doing?
>>>>If it would, would then I have to do something during the indexing
>>>>
>>>>
>>>>time to set normalization / scoring factors for that field to
>>>>something or other?
>>>>
>>>>Thanks.
>>>>Dmitry.
>>>>
>>>>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:
>>>><mailto:[EMAIL PROTECTED]>
>>>>For additional commands, e-mail:
>>>><mailto:[EMAIL PROTECTED]>
>>>>
>>>>
>>>>
>>>>
>>>
>>>--
>>>To unsubscribe, e-mail:
>>><mailto:[EMAIL PROTECTED]>
>>>For additional commands, e-mail:
>>><mailto:[EMAIL PROTECTED]>
>>>
>>>
>>>
>>
>>--
>>To unsubscribe, e-mail:
>><mailto:[EMAIL PROTECTED]>
>>For additional commands, e-mail:
>><mailto:[EMAIL PROTECTED]>
>>
>>
>>
>
>
>__________________________________________________
>Do you Yahoo!?
>Faith Hill - Exclusive Performances, Videos & More
>http://faith.yahoo.com
>
>--
>To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>
>
>
>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>