You can not penalize terms, you can only reward terms. The trick is to
reward important terms and so all other (unwanted and unknown) terms get
penalized. One method is to analyze sentences for grammar (part-of-speech
tagging) and reward nouns or other keywords with boosting values, and use
an extended similarity algorithm.

You can use UIMA or OpenNLP or Stanford NLP for POS tagging, and try to
implement payload-based scoring, something like this demo code

https://github.com/jprante/elasticsearch-payload

My demo code does not work,  not sure where I made a mistake.

Jörg

On Sun, Apr 12, 2015 at 12:34 PM, Yehosef Shapiro <[email protected]> wrote:

> Often people using our search type "how to <something>"   eg "how to paint
> my kitchen".  This might result in results for "tips to paint my kitchen"
> or "how to paint my bathroom".  the phrase "how to" is a generic phrase and
> I would like to minimize its significance.  I don't want to remove it
> completely because I still would like a post called "how to paint my
> kitchen cabinets" to match higher than "should I wallpaper or paint my
> kitchen".
>
> I don't want it to be a stopword because it still has value (as in the
> example).
>
> The Common Terms query might work - but I don't necessarily want to apply
> the rules to all other common phrases (it might be a good idea - but this
> is a specific common search term that I know people search for and I would
> like to solve it specifically for this case if possible.)
>
> I don't think the negative boost is what I want because I don't want those
> documents to get penalized for containing the words "how to" - just that
> they should get a much smaller boost.
>
> Any suggestions how to approach this?  For the record, I'm using the BM25
> similarity algorithm.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE0GW0Frjv3coC6-iMK81fEVZLR8R2S9fayqR8bTpx2qw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to