Hello everyone, I am Dwaipayan, a research scholar from Indian Statistical Institute, Kolkata working in the field of Information Retrieval. For my research purpose, I use Lucene (4.10.4).
Recently, I am facing a doubt regarding Lucene on how to boost the query term at the time of searching. Preciously, I am implementing a paper on query expansion (Relevance Based Language Model - Victor Lavrenko, Bruce Croft, SIGIR-2001). In the paper, the expanded query is formed with terms taken from the initially retrieved documents. The expansion terms are selected and weighted following a probability. Thus, the weight of the expansion terms are some probability values which are normalized to summed into one. This results into making the term weights a small fractional decimal value; e.g. for most of the cases, it is some where near to 0.1 if 10 expansion terms are added and the weight keeps on reducing if more expansion terms are considered. When I am using this fractional decimal value as the expansion term weight in Lucene BooleanQuery, I am not getting the expected result. I think the problem is with the weight that is applied with setBoost()of lucene boolean query. Exactly following the paper, I am setting these weights with those normalized probability values. Can anyone of you please help me out in this problem? Thanks, Dwaipayan Roy. Research Scholar Indian Statistical Institute Kolkata, India