Hi David,

It may not matter for your use case  but just in case you really are
interested in the "real BM25F" there is a difference between configuring K1
and B for different fields in Solr and a "real" BM25F implementation.  This
has to do with Solr's model of fields being mini-documents (i.e. each field
has its own length, idf and tf)   See the discussion in
https://issues.apache.org/jira/browse/LUCENE-2959, particularly these
comments by Robert Muir:

"Actually as far as BM25f, this one presents a few challenges (some already
discussed on LUCENE-2091 <https://issues.apache.org/jira/browse/LUCENE-2091>
).

To summarize:

   - for any field, Lucene has a per-field terms dictionary that contains
   that term's docFreq. To compute BM25f's IDF method would be challenging,
   because it wants a docFreq "across all the fields". (its not clear to me at
   a glance either from the original paper, if this should be across only the
   fields in the query, across all the fields in the document, and if a
   "static" schema is implied in this scoring system (in lucene document 1 can
   have 3 fields and document 2 can have 40 different ones, even with
   different properties).
   - the same issue applies to length normalization, lucene has a "field
   length" but really no concept of document length."

Tom

On Thu, Apr 14, 2016 at 12:41 PM, David Cawley <david.cawl...@mail.dcu.ie>
wrote:

> Hello,
> I am developing an enterprise search engine for a project and I was hoping
> to implement BM25F ranking algorithm to configure the tuning parameters on
> a per field basis. I understand BM25 similarity is now supported in Solr
> but I was hoping to be able to configure k1 and b for different fields such
> as title, description, anchor etc, as they are structured documents.
> I am fairly new to Solr so any help would be appreciated. If this is
> possible or any steps as to how I can go about implementing this it would
> be greatly appreciated.
>
> Regards,
>
> David
>
> Current Solr Version 5.4.1
>

Reply via email to