Thanks a lot Ivan, great answer. 

Suppose I use in my script my own formula for tf (with 
_index[field][term].tf()) and set the boost_mode to "replace", does 
elasticsearch calculate the tf two times or once only? In other words, is 
it computionnally efficient to calculate my own tf? Should I turn off other 
calculations made by es somewhere else to avoid double calculations?

Cheers,
Patrick

Le jeudi 20 mars 2014 17:44:53 UTC-4, Ivan Brusic a écrit :
>
> You can provide your own similarity to be used at the field level, but 
> recent version of elasticsearch allows you to access the tf-idf values in 
> order to do custom scoring [1]. Also look at Britta's recent talk on the 
> subject [2].
>
> That said, either your custom similarity or custom scoring would need 
> access to what exactly are the terms which are repeated many times. Have 
> you looked into omitting term frequencies? It would completely bypass using 
> term frequencies, which might be an overkill in your case. Look into the 
> index options [3].
>
> Finally, perhaps the common terms query can help [4].
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
>
> [2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
>
> [3] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
>
> [4] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html
>
> Cheers,
>
> Ivan
>
>
> On Thu, Mar 20, 2014 at 8:08 AM, geantbrun <agin.p...@gmail.com<javascript:>
> > wrote:
>
>> Hi,
>> If I understand well, the formula used for the term frequency part in the 
>> default similarity module is the square root of the actual frequency. Is it 
>> possible to modify that formula to include something like a 
>> min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's for 
>> documents that have the same term repeated many times. It seems that BM25 
>> similarity has a parameter to control saturation but I would prefer to 
>> stick with the simple tf/idf similarity module.
>> Thank you for your help
>> Patrick
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to