So because we're using BM25, I think this is a lower concern in general ( 
good chart 
in 
http://www.elastic.co/guide/en/elasticsearch/guide/master/pluggable-similarites.html)
 
 
We also disable norms on title fields 
(http://stackoverflow.com/questions/20222652/elasticsearch-when-to-set-omit-norms-option-as-false)
 
FWIW.

Thanks for the link - Good info.  I'm leaning toward something like you 
recommend in your keepWordFilter - but doing it at query time instead of 
index time.  It doesn't seem like I need to use the memory to store 
"Socrates and Plato on Metaphysics" and also "Socrates Plato Metaphysics" - 
seems better to make the distinction at query time - and the performance 
should be the same because I need two search clauses anyway.


On Monday, April 13, 2015 at 12:15:14 AM UTC+3, Doug Turnbull wrote:
>
> Yehosef, this sounds very similar to some title search work I've done. 
> Title fields are odd because TF is often meaningless, and IDF can also
> Be quite skewed. If only a few titles have "how" in the text, then you'll 
> get very odd results. 
>
> Read more here:
>
> http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/
>
> On Sunday, April 12, 2015, Yehosef Shapiro <[email protected] <javascript:>> 
> wrote:
>
>> Often people using our search type "how to <something>"   eg "how to 
>> paint my kitchen".  This might result in results for "tips to paint my 
>> kitchen" or "how to paint my bathroom".  the phrase "how to" is a generic 
>> phrase and I would like to minimize its significance.  I don't want to 
>> remove it completely because I still would like a post called "how to paint 
>> my kitchen cabinets" to match higher than "should I wallpaper or paint my 
>> kitchen".
>>
>> I don't want it to be a stopword because it still has value (as in the 
>> example).  
>>
>> The Common Terms query might work - but I don't necessarily want to apply 
>> the rules to all other common phrases (it might be a good idea - but this 
>> is a specific common search term that I know people search for and I would 
>> like to solve it specifically for this case if possible.)
>>
>> I don't think the negative boost is what I want because I don't want 
>> those documents to get penalized for containing the words "how to" - just 
>> that they should get a much smaller boost.
>>
>> Any suggestions how to approach this?  For the record, I'm using the BM25 
>> similarity algorithm.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/acd86fb2-ae69-40be-a772-c65d008f2415%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, 
> LLC | 240.476.9983 | http://www.opensourceconnections.com 
> Author: Taming Search <http://manning.com/turnbull> from Manning 
> Publications 
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless 
> of whether attachments are marked as such.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7cceb1d2-cefc-420b-bb97-bba2eb2b97fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to