Yehosef, this sounds very similar to some title search work I've done. Title fields are odd because TF is often meaningless, and IDF can also Be quite skewed. If only a few titles have "how" in the text, then you'll get very odd results.
Read more here: http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/ On Sunday, April 12, 2015, Yehosef Shapiro <[email protected]> wrote: > Often people using our search type "how to <something>" eg "how to paint > my kitchen". This might result in results for "tips to paint my kitchen" > or "how to paint my bathroom". the phrase "how to" is a generic phrase and > I would like to minimize its significance. I don't want to remove it > completely because I still would like a post called "how to paint my > kitchen cabinets" to match higher than "should I wallpaper or paint my > kitchen". > > I don't want it to be a stopword because it still has value (as in the > example). > > The Common Terms query might work - but I don't necessarily want to apply > the rules to all other common phrases (it might be a good idea - but this > is a specific common search term that I know people search for and I would > like to solve it specifically for this case if possible.) > > I don't think the negative boost is what I want because I don't want those > documents to get penalized for containing the words "how to" - just that > they should get a much smaller boost. > > Any suggestions how to approach this? For the record, I'm using the BM25 > similarity algorithm. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <javascript:_e(%7B%7D,'cvml','elasticsearch%[email protected]');> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com Author: Taming Search <http://manning.com/turnbull> from Manning Publications This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL-nLmW3Gc28VN9BXKpBF_gB2CCGyeAn0YOqV6VFCkQmcQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
