Hi,
> In our use case, we want to perform learning to rank and train a decision
> tree using BM25 scores as one of our features. Decision trees requires
> normalised features to be able to properly split the data. Since BM25 scores
> for different queries varies considerably, decision tree cannot find a
> suitable threshold to split.

The "old Lucene" query normalization has nothing to do with BM25. This 
normalization is done based on the query only, just to ensure that numbers are 
around 1 (which has reasons on early days of lucene where huge scores lead to 
rounding problems). This was removed in Lucene 7 together with TF-IDF based 
"coordination factors" in boolean queries. In fact this is an improvement, 
because the normalization scaled the values by some factor depending on query, 
making them impossible to compare.
 
> What was the normalisation in Lucene 6? We are using Lucene 6.4.2 but
> could
> not find any way to normalise BM25 scores other than hacking into the code.

In Lucene 7 the scores are no longer normalized and are way better to compare 
between queries of similar structure and different indexes, but still with no 
guarantees (of course comparing a query with different number of words or 
completely different structure is still not easily possible). Plain word-based 
queries ("match query in Elasticsearch) should be fine if you somehow add your 
own normalization on the number of terms in the query (e.g, divide score final 
score by number of terms). For LTR purposes this should be fine. I'd try the 
Lucene 7 master version to validate if this helps for your use case.

Uwe

> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-it-
> possible-to-normalise-BM25-scores-in-the-query-level-
> tp4342991p4343048.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to