You can not penalize terms, you can only reward terms. The trick is to reward important terms and so all other (unwanted and unknown) terms get penalized. One method is to analyze sentences for grammar (part-of-speech tagging) and reward nouns or other keywords with boosting values, and use an extended similarity algorithm.
You can use UIMA or OpenNLP or Stanford NLP for POS tagging, and try to implement payload-based scoring, something like this demo code https://github.com/jprante/elasticsearch-payload My demo code does not work, not sure where I made a mistake. Jörg On Sun, Apr 12, 2015 at 12:34 PM, Yehosef Shapiro <[email protected]> wrote: > Often people using our search type "how to <something>" eg "how to paint > my kitchen". This might result in results for "tips to paint my kitchen" > or "how to paint my bathroom". the phrase "how to" is a generic phrase and > I would like to minimize its significance. I don't want to remove it > completely because I still would like a post called "how to paint my > kitchen cabinets" to match higher than "should I wallpaper or paint my > kitchen". > > I don't want it to be a stopword because it still has value (as in the > example). > > The Common Terms query might work - but I don't necessarily want to apply > the rules to all other common phrases (it might be a good idea - but this > is a specific common search term that I know people search for and I would > like to solve it specifically for this case if possible.) > > I don't think the negative boost is what I want because I don't want those > documents to get penalized for containing the words "how to" - just that > they should get a much smaller boost. > > Any suggestions how to approach this? For the record, I'm using the BM25 > similarity algorithm. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE0GW0Frjv3coC6-iMK81fEVZLR8R2S9fayqR8bTpx2qw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
