Re: custom similarity based on tf but greater than 1.0

Vagelis Kotsonis Tue, 23 Jan 2007 14:42:55 -0800

So the normalization was made through Hits. That was something I didn't
understand. 
If I was alone I would search in Scorer and query classes.


Thank you for this.

Finally I used the following:

final HitQueue hq = new HitQueue(results.length());
            searcher.search(qr, new HitCollector() {
                public void collect(int doc, float score) {
                    hq.insert(new ScoreDoc(doc, score));
                }
            });
            ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
            for (int i = hq.size()-1; i >= 0; i--)        // put docs in array
                scoreDocs[i] = (ScoreDoc)hq.pop();

(the HitQueue extend PriorityQueue of lucene library)

I didn't understand how to use the TopDocs, so I followed the above example
I found in the following link:
http://www.devdaily.com/java/jwarehouse/lucene-1.3-final/src/java/org/apache/lucene/search/IndexSearcher.java.shtml
http://www.devdaily.com/java/jwarehouse/lucene-1.3-final/src/java/org/apache/lucene/search/IndexSearcher.java.shtml
 

The only problem I have is the case when the HitsQueue size must be
predefined...I don't know what to do then.
Currently I submit the same query twice, one for getting the size of the
results and one to use with the above code.

Thank you very much for your help!
Vagelis


markrmiller wrote:
> 
> Ahhhh...I brushed over your example too fast...looked like normal 
> counting to me...I see now what you mean. So OMIT_NORMS probably did 
> work. Are you getting the results through hits? Hits will normalize. Use 
> topdocs or a hitcollector.
> 
> - Mark
> 
> Vagelis Kotsonis wrote:
>> But i don't want to get the frequency of each term in the doc.
>>
>> what I want is 1 if the term exists in the doc and 0 if it doesn't. After
>> this, I want all thes 1s and 0s to be summed and give me a number to use
>> as
>> a score.
>>
>> If I set the TF value as 1 or 0, as I described above, I get the right
>> number, but this number is normalized to 1.0 and smaller numbers.
>>
>> It is the normalization that I want to avoid.
>>
>> Thanks again!
>> Vagelis
>>
>>
>> markrmiller wrote:
>>   
>>> Dont return 1 for tf...just return the tf straight with no 
>>> changes...return freq. For everything else return 1. After that 
>>> OMIT_NORMS should work. If you want to try a custom reader:
>>>
>>> public class FakeNormsIndexReader extends FilterIndexReader {
>>>     byte[] ones = SegmentReader.createFakeNorms(maxDoc());
>>>
>>>     public FakeNormsIndexReader(IndexReader in) {
>>>         super(in);
>>>     }
>>>     public synchronized byte[] norms(String field) throws IOException {
>>>           System.out.println("returning fake norms...");
>>>         return ones;
>>>     }
>>>
>>>     public synchronized void norms(String field, byte[] result, int 
>>> offset) {
>>>           System.out.println("writing fake norms...");
>>>         System.arraycopy(ones, 0, result, offset, maxDoc());
>>>     }
>>> }
>>>
>>> The beauty of this reader is that you can flip between it and your 
>>> custom similarity and Lucene's default implementations live on the same 
>>> index.
>>>
>>> - Mark
>>>
>>>
>>>     
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8550786
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: custom similarity based on tf but greater than 1.0

Reply via email to