Hello, I want to use Lucene in information retrieval of documents which contains probabilistically weighted words or fields (in Lucene). Below, I give a template model as an example in order to illustrate my problem. Any information or advice will be very valuable for me. Thank you very much. A document containing two words will be something like this: ------------------------------------------------------------------------------------- cherry(0,83)/chary(0,17) know(0,76)/now(0,24) -------------------------------------------------------------------------------------- In calculation of vector of the document (according to the vector based model), I want the term frequency of the term “cherry” be ‘0.83’ (instead of ‘1’) and “chary” be ‘0.17’. This will yield that consequence: When we enter “cherry” as query, a document including so many “chary” can score higher than documents including a few “cherry”. Thank you very much again.
