Yes I saw Britta's slides but I find it difficult to implement my own 
scoring for complex queries (ex: with AND and OR).
Do you have a concrete example or a link to share to explain with more 
details the override alternative?
Thanks again Ivan,
Patrick

Le mardi 25 mars 2014 12:04:26 UTC-4, Ivan Brusic a écrit :
>
> Did you see Britta's slides? She has a slide called "Cosine similarity as 
> script" which mimics the Lucene scoring as a script. You can replace the 
> call to _index[field][word].tf() with your own implementation. You can 
> deploy the script as a native Java script (note: not Javascript) for 
> performance.
>
> I find it easier to understand to just change the Similarity. Simply over 
> DefaultSimilarity and override "public float tf(float freq)" and then 
> reference this similarity in your field mapping.
>
> -- 
> Ivan
>
>
> On Tue, Mar 25, 2014 at 6:57 AM, geantbrun <agin.p...@gmail.com<javascript:>
> > wrote:
>
>> Thanks again for the answer Ivan. Would it be simpler to modify directly 
>> in the source code the way tf is calculated? I mean replacing somewhere 
>> something like tf = sqrt(n) by tf = min(10,sqrt(n)).
>> Cheers,
>> Patrick
>>
>> Le vendredi 21 mars 2014 18:01:51 UTC-4, Ivan Brusic a écrit :
>>>
>>> Term frequencies are stored within Lucene, so there is no calculating of 
>>> the value, just a lookup in the data structure. You can disable term 
>>> frequencies and then create your own in the script, but it would be easier 
>>> to calculate that value at index time so that you can access it within your 
>>> custom score and not have to iterate through all the terms yourself. Britta 
>>> has posted on the mailing list in the past, so hopefully she will reply 
>>> with some more authoritative answers, especially ones regarding performance.
>>>
>>> -- 
>>> Ivan
>>>
>>>
>>> On Fri, Mar 21, 2014 at 11:54 AM, geantbrun <agin.p...@gmail.com> wrote:
>>>
>>>> Thanks a lot Ivan, great answer. 
>>>>
>>>> Suppose I use in my script my own formula for tf (with 
>>>> _index[field][term].tf()) and set the boost_mode to "replace", does 
>>>> elasticsearch calculate the tf two times or once only? In other words, is 
>>>> it computionnally efficient to calculate my own tf? Should I turn off 
>>>> other 
>>>> calculations made by es somewhere else to avoid double calculations?
>>>>
>>>> Cheers,
>>>> Patrick
>>>>
>>>> Le jeudi 20 mars 2014 17:44:53 UTC-4, Ivan Brusic a écrit :
>>>>>
>>>>> You can provide your own similarity to be used at the field level, but 
>>>>> recent version of elasticsearch allows you to access the tf-idf values in 
>>>>> order to do custom scoring [1]. Also look at Britta's recent talk on the 
>>>>> subject [2].
>>>>>
>>>>> That said, either your custom similarity or custom scoring would need 
>>>>> access to what exactly are the terms which are repeated many times. Have 
>>>>> you looked into omitting term frequencies? It would completely bypass 
>>>>> using 
>>>>> term frequencies, which might be an overkill in your case. Look into the 
>>>>> index options [3].
>>>>>
>>>>> Finally, perhaps the common terms query can help [4].
>>>>>
>>>>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc
>>>>> e/current/modules-advanced-scripting.html
>>>>>
>>>>> [2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
>>>>>
>>>>> [3] http://www.elasticsearch.org/guide/en/elasticsearch/refe
>>>>> rence/current/mapping-core-types.html#string
>>>>>
>>>>> [4] http://www.elasticsearch.org/guide/en/elasticsearch/refe
>>>>> rence/current/query-dsl-common-terms-query.html
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ivan
>>>>>
>>>>>
>>>>> On Thu, Mar 20, 2014 at 8:08 AM, geantbrun <agin.p...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>> If I understand well, the formula used for the term frequency part in 
>>>>>> the default similarity module is the square root of the actual 
>>>>>> frequency. 
>>>>>> Is it possible to modify that formula to include something like a 
>>>>>> min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's 
>>>>>> for documents that have the same term repeated many times. It seems that 
>>>>>> BM25 similarity has a parameter to control saturation but I would prefer 
>>>>>> to 
>>>>>> stick with the simple tf/idf similarity module.
>>>>>> Thank you for your help
>>>>>> Patrick
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40goo
>>>>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/64a9a877-8a97-462b-bbc2-5f2280b14d2f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8d9dcc21-25a3-45cf-ab76-6791f1a41565%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/8d9dcc21-25a3-45cf-ab76-6791f1a41565%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/888ccb7d-1388-4a21-a2b9-9cc1511376d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to