Thanks for this  - I so I could basically strip out the unwanted terms. 
 Then I could do the search with two clauses, one with the original search 
phrase with a lower weight and another with the "cleaned" search phrase 
with a higher weight.  

On Monday, April 13, 2015 at 12:05:44 AM UTC+3, Jörg Prante wrote:
>
> You can not penalize terms, you can only reward terms. The trick is to 
> reward important terms and so all other (unwanted and unknown) terms get 
> penalized. One method is to analyze sentences for grammar (part-of-speech 
> tagging) and reward nouns or other keywords with boosting values, and use 
> an extended similarity algorithm.
>
> You can use UIMA or OpenNLP or Stanford NLP for POS tagging, and try to 
> implement payload-based scoring, something like this demo code
>
> https://github.com/jprante/elasticsearch-payload
>
> My demo code does not work,  not sure where I made a mistake.
>
> Jörg
>
> On Sun, Apr 12, 2015 at 12:34 PM, Yehosef Shapiro <[email protected] 
> <javascript:>> wrote:
>
>> Often people using our search type "how to <something>"   eg "how to 
>> paint my kitchen".  This might result in results for "tips to paint my 
>> kitchen" or "how to paint my bathroom".  the phrase "how to" is a generic 
>> phrase and I would like to minimize its significance.  I don't want to 
>> remove it completely because I still would like a post called "how to paint 
>> my kitchen cabinets" to match higher than "should I wallpaper or paint my 
>> kitchen".
>>
>> I don't want it to be a stopword because it still has value (as in the 
>> example).  
>>
>> The Common Terms query might work - but I don't necessarily want to apply 
>> the rules to all other common phrases (it might be a good idea - but this 
>> is a specific common search term that I know people search for and I would 
>> like to solve it specifically for this case if possible.)
>>
>> I don't think the negative boost is what I want because I don't want 
>> those documents to get penalized for containing the words "how to" - just 
>> that they should get a much smaller boost.
>>
>> Any suggestions how to approach this?  For the record, I'm using the BM25 
>> similarity algorithm.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/baa4565e-9b2d-45f9-8711-db8950b9ce1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to