Re: how to modify term frequency formula?

geantbrun Mon, 31 Mar 2014 10:37:34 -0700

I realize that I probably have to define the similarity property of my 
field as "my_similarity" (and not as "tfCappedSimilarity") and define in 
the settings my_similarity as being of type tfCappedSimilarity.
When I do that, I get the following error at the index/mapping creation:


{"error":"IndexCreationException[[exbd] failed to create index]; nested: 
NoClassSettingsException[Failed to load class setting [type] with value 
[tfCappedSimilarity]]; nested: 
ClassNotFoundException[org.elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider];
 
","status":500}]

Note that the provider is referred in the error as 
tfCappedSimilaritySimilarityProvider 
(similarity repeated 2 times). Is it normal?
Patrick

Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :
>
> Hi Ivan,
> I followed your instructions but it does not seem to work, I must be wrong 
> somewhere. I created the jar file from the following two java files, could 
> you tell me if they are ok?
>
> tfCappedSimilarity.java
> ***************************
> package org.elasticsearch.index.similarity;
>
> import org.apache.lucene.search.similarities.DefaultSimilarity;
> import org.elasticsearch.common.logging.ESLogger;
> import org.elasticsearch.common.logging.Loggers;
>
> public class tfCappedSimilarity extends DefaultSimilarity {
>
>         private ESLogger logger;
>
>         public tfCappedSimilarity() {
>                 logger = Loggers.getLogger(getClass());
>         }
>
>         /**
>          * Capped tf value
>          */
>         @Override
>         public float tf(float freq) {
>                 return (float)Math.sqrt(Math.min(9, freq));
>         }
> }
>
> tfCappedSimilarityProvider.java
> *************************************
> package org.elasticsearch.index.similarity;
>
> import org.elasticsearch.common.inject.Inject;
> import org.elasticsearch.common.inject.assistedinject.Assisted;
> import org.elasticsearch.common.settings.Settings;
>
> public class tfCappedSimilarityProvider extends AbstractSimilarityProvider 
> {
>
>         private tfCappedSimilarity similarity;
>
>         @Inject
>         public tfCappedSimilarityProvider(@Assisted String name, @Assisted 
> Settings settings) {
>                 super(name);
>                 this.similarity = new tfCappedSimilarity();
>         }
>
>         /**
>          * {@inheritDoc}
>          */
>         @Override
>         public tfCappedSimilarity get() {
>                 return similarity;
>         }
> }
>
>
> In my mapping, I define the similarity property of my field as 
> tfCappedSimilarity, is it ok?
>
> What makes me say that it does not work: I insert a doc with a word 
> repeated 16 times in my field. When I do a search with that word, the 
> result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is 
> there a way to know if the similarity was loaded or not (maybe in a log 
> file?).
>
> Cheers,
> Patrick
>
> Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :
>>
>> I updated my gist to illustrate the SimilarityProvider that goes along 
>> with it. Similarities are easier to add to Elasticsearch than most plugins. 
>> You just need to compile the two files into a jar and then add that jar 
>> into Elasticsearch's classpath ($ES_HOME/lib most likely). The code will 
>> scan for every SimilarityProvider defined and load it.
>>
>> You then mapping the similarity to a field: 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_configuring_similarity_per_field
>>
>> Note that you cannot change the similarity of a field dynamically.
>>
>> Ivan
>>
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_configuring_similarity_per_field
>>
>>
>> On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <agin.p...@gmail.com> wrote:
>>
>>> Britta is looping over words that are passed as parameters. It's easy to 
>>> implement her script for a simple query but what about boolean querys? In 
>>> my understanding (but I could be wrong of course), I would have to parse 
>>> the query to call the script with each sub-clause, am I wrong?
>>>
>>> I prefer your custom similarity alternative. Again, sorry for the silly 
>>> question (newbie!) but where do you put your java file? Is it the only 
>>> thing that is needed (except for the modification in the mapping)?
>>> cheers,
>>> Patrick
>>>
>>> Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :
>>>>
>>>> I am still on a version of Elasticsearch that does not have access to 
>>>> the new scoring capabilities, so I cannot test out any scripts. The non 
>>>> normalized term frequency should be the line:
>>>> tf = _index[field][word].tf()
>>>>
>>>> If that is the case, you could substitute that line with something like:
>>>> tf = Math.min(10, _index[field][word].tf())
>>>>
>>>> As a stated before, I am used to using Similarities, so I find the 
>>>> example easier. Here is a custom similarity that I used in Elasticsearch 
>>>> (removes any norms that are indexed):
>>>> https://gist.github.com/brusic/9786587
>>>>
>>>> The second part would be the tf() method you would need to implement 
>>>> instead of decodeNormValue I used.
>>>>
>>>> Cheers,
>>>>
>>>> Ivan
>>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to modify term frequency formula?

Reply via email to