> Also, i don't understand why the encode/decode functions have a range of
7x10^9 to 2x10^-9, when it seems to me the most common values are (boosts set to 1.0) something between 1.0 and 0. When would somebody have a monster huge value like 7x10^9? Even with a huge index time boost of 20.0 or something, why would the encode/decode need a range as huge as the current implementation?
I have often asked myself the same thing, I have just tried to avoid depending on the field norms if possible. For instance, if you have your own array of how long each of your fields are you can just boost the documents however you want in your HitCollector by looking up the value in your array using the docId. That is the approach we have generally taken in our application. You can get how many terms are in each field by creating an array of length maxDoc and then iterating over all of the TermPositions for that field and remembering the maximum position that you saw for each document. This array is also useful for implementing exact phrase matching, so suppose someone wants documents that match *exactly* "Nissan Altima", you would do a phrase search for "Nissan Altima" and then just ignore all the results that do not have exactly two terms in that field. For example "Nissan Altima Standard" would match that query but you would see in your array that it has 3 terms, when you only care about results that have 2 terms. But you have to implement your own HitCollector object and use that instead of using the "Hits" interface. To get an idea of how to do that you can look at the HitCollector that the Hits object uses. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]