Yehosef, this sounds very similar to some title search work I've done.
Title fields are odd because TF is often meaningless, and IDF can also
Be quite skewed. If only a few titles have "how" in the text, then you'll
get very odd results.

Read more here:
http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/

On Sunday, April 12, 2015, Yehosef Shapiro <[email protected]> wrote:

> Often people using our search type "how to <something>"   eg "how to paint
> my kitchen".  This might result in results for "tips to paint my kitchen"
> or "how to paint my bathroom".  the phrase "how to" is a generic phrase and
> I would like to minimize its significance.  I don't want to remove it
> completely because I still would like a post called "how to paint my
> kitchen cabinets" to match higher than "should I wallpaper or paint my
> kitchen".
>
> I don't want it to be a stopword because it still has value (as in the
> example).
>
> The Common Terms query might work - but I don't necessarily want to apply
> the rules to all other common phrases (it might be a good idea - but this
> is a specific common search term that I know people search for and I would
> like to solve it specifically for this case if possible.)
>
> I don't think the negative boost is what I want because I don't want those
> documents to get penalized for containing the words "how to" - just that
> they should get a much smaller boost.
>
> Any suggestions how to approach this?  For the record, I'm using the BM25
> similarity algorithm.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <javascript:_e(%7B%7D,'cvml','elasticsearch%[email protected]');>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALG6HL-nLmW3Gc28VN9BXKpBF_gB2CCGyeAn0YOqV6VFCkQmcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to