Term frequencies are stored within Lucene, so there is no calculating of
the value, just a lookup in the data structure. You can disable term
frequencies and then create your own in the script, but it would be easier
to calculate that value at index time so that you can access it within your
custom score and not have to iterate through all the terms yourself. Britta
has posted on the mailing list in the past, so hopefully she will reply
with some more authoritative answers, especially ones regarding performance.

-- 
Ivan


On Fri, Mar 21, 2014 at 11:54 AM, geantbrun <agin.patr...@gmail.com> wrote:

> Thanks a lot Ivan, great answer.
>
> Suppose I use in my script my own formula for tf (with
> _index[field][term].tf()) and set the boost_mode to "replace", does
> elasticsearch calculate the tf two times or once only? In other words, is
> it computionnally efficient to calculate my own tf? Should I turn off other
> calculations made by es somewhere else to avoid double calculations?
>
> Cheers,
> Patrick
>
> Le jeudi 20 mars 2014 17:44:53 UTC-4, Ivan Brusic a écrit :
>>
>> You can provide your own similarity to be used at the field level, but
>> recent version of elasticsearch allows you to access the tf-idf values in
>> order to do custom scoring [1]. Also look at Britta's recent talk on the
>> subject [2].
>>
>> That said, either your custom similarity or custom scoring would need
>> access to what exactly are the terms which are repeated many times. Have
>> you looked into omitting term frequencies? It would completely bypass using
>> term frequencies, which might be an overkill in your case. Look into the
>> index options [3].
>>
>> Finally, perhaps the common terms query can help [4].
>>
>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-advanced-scripting.html
>>
>> [2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
>>
>> [3] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/mapping-core-types.html#string
>>
>> [4] http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/query-dsl-common-terms-query.html
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Thu, Mar 20, 2014 at 8:08 AM, geantbrun <agin.p...@gmail.com> wrote:
>>
>>> Hi,
>>> If I understand well, the formula used for the term frequency part in
>>> the default similarity module is the square root of the actual frequency.
>>> Is it possible to modify that formula to include something like a
>>> min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's for
>>> documents that have the same term repeated many times. It seems that BM25
>>> similarity has a parameter to control saturation but I would prefer to
>>> stick with the simple tf/idf similarity module.
>>> Thank you for your help
>>> Patrick
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCoMY8N2YgWCuzsh9MFnaQUZA6e3dhza%3DFPaB2JzUYV3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to