Hello,
 
I want to use Lucene in information retrieval of documents which contains 
probabilistically weighted words or fields (in Lucene). Below, I give a 
template model as an example in order to illustrate my problem. Any information 
or advice will be very valuable for me. Thank you very much.
 
A document containing two words will be something like this:
-------------------------------------------------------------------------------------
 
cherry(0,83)/chary(0,17)  know(0,76)/now(0,24) 
--------------------------------------------------------------------------------------
 
In calculation of vector of the document (according to the vector based model), 
I want the term frequency of the term “cherry” be ‘0.83’ (instead of ‘1’) and 
“chary” be ‘0.17’.
 
This will yield that consequence:
 
When we enter “cherry” as query, a document including so many “chary” can score 
higher than documents including a few “cherry”.    
 
Thank you very much again.


      

Reply via email to