Hi folks
We started using the default implementation of MLT
(org.apache.solr.handler.MoreLikeThisHandler) recently and found that there
are a couple of things it lacks:
1. Searching for terms in the same field as the original document:
- the current implementation picks the top field to search an
interesting term in based on docFreq, however this can give bad
results if
say original product is from brand:"RED Valentino", and we end
up searching
red in color field.
2. Phrase boosts:
- if product name is "business cards", then it makes sense to give a
boost to the phrase boost to products which are also business cards.
3. Support for bq, bf, fq, multiplicative boost:
- you might want to filter out_of_stock products, give a
multiplicative boost to a product based on their price
similarity / launch
date.
4. Support of explainOther
We had a use case for each of these and i ended up writing my own
MLTQueryParser which builds the MLT query for a given document. It also has
a new concept called childDocs. You can think of some documents as
products, and a collection of products can be though of as a category page.
You could search for similar documents based on the products a category
page has.
I was wondering if you guys would be interested in an alternate
implementation of MLT that supports all the knobs that solr search does. I
could post a patch file maybe?
Thanks
Gagan