Thanks for this - I so I could basically strip out the unwanted terms. Then I could do the search with two clauses, one with the original search phrase with a lower weight and another with the "cleaned" search phrase with a higher weight.
On Monday, April 13, 2015 at 12:05:44 AM UTC+3, Jörg Prante wrote: > > You can not penalize terms, you can only reward terms. The trick is to > reward important terms and so all other (unwanted and unknown) terms get > penalized. One method is to analyze sentences for grammar (part-of-speech > tagging) and reward nouns or other keywords with boosting values, and use > an extended similarity algorithm. > > You can use UIMA or OpenNLP or Stanford NLP for POS tagging, and try to > implement payload-based scoring, something like this demo code > > https://github.com/jprante/elasticsearch-payload > > My demo code does not work, not sure where I made a mistake. > > Jörg > > On Sun, Apr 12, 2015 at 12:34 PM, Yehosef Shapiro <[email protected] > <javascript:>> wrote: > >> Often people using our search type "how to <something>" eg "how to >> paint my kitchen". This might result in results for "tips to paint my >> kitchen" or "how to paint my bathroom". the phrase "how to" is a generic >> phrase and I would like to minimize its significance. I don't want to >> remove it completely because I still would like a post called "how to paint >> my kitchen cabinets" to match higher than "should I wallpaper or paint my >> kitchen". >> >> I don't want it to be a stopword because it still has value (as in the >> example). >> >> The Common Terms query might work - but I don't necessarily want to apply >> the rules to all other common phrases (it might be a good idea - but this >> is a specific common search term that I know people search for and I would >> like to solve it specifically for this case if possible.) >> >> I don't think the negative boost is what I want because I don't want >> those documents to get penalized for containing the words "how to" - just >> that they should get a much smaller boost. >> >> Any suggestions how to approach this? For the record, I'm using the BM25 >> similarity algorithm. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baa4565e-9b2d-45f9-8711-db8950b9ce1a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
