I added a simple Maven pom to the gist:
https://gist.github.com/brusic/9786587#file-pom-xml

Easiest thing to do is download Maven (if you do not have it) and use it
take care handling the dependencies and build a jar if you simple execute:
mvn package

Since Elasticsearch already comes bundle with the correct jars, you can
also add those to your classpath instead. I think you only need Lucene
core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the
question marks for the correct version. I am not on Elasticsearch, so I do
not know offhand which version of Lucene is packaged.

-- 
Ivan


On Thu, Apr 3, 2014 at 7:44 AM, geantbrun <agin.patr...@gmail.com> wrote:

> Ivan,
> Sorry but I realize (I'm totally unaware of Java) that I skipped the java
> compile step (I simply put the java files in a jar file with jar cf). The
> problem now is that executing :
>
> javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar
>
> generates errors, the first one being:
>
> package org.apache.lucene.search.similarities does not exist
>
> Googled it but found nothing. Any idea?
> Patrick
>
> P.S. I installed elasticsearch following the easy 
> way<https://gist.github.com/wingdspur/2026107>(dpkg the deb file)
>
> Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit :
>
>> Thanks again for your great help Ivan. Does not work for me. When I
>> substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or
>> simply by BM25), it works. Is it possible that I put my jar file in the
>> wrong directory (usr/share/elasticsearch/lib)? Is it necessary to
>> *register* somewhere the new classes I define before restarting service?
>> Cheers,
>> Patrick
>>
>> Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit :
>>>
>>> Are you using a full class name? I have no problems with
>>>
>>> curl -XPOST 'http://localhost:9200/sim/' -d '
>>> {
>>>  "settings" : {
>>>    "similarity" : {
>>>     "my_similarity" : {
>>>      "type" : "org.elasticsearch.index.similarity.
>>> NormRemovalSimilarityProvider"
>>>     }
>>>   }
>>>  },
>>>  "mappings" : {
>>>   "post" : {
>>>    "properties" : {
>>>     "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
>>>     "name" : { "type" : "string", "store" : "yes", "index" : "analyzed"},
>>>     "contents" : { "type" : "string", "store" : "no", "index" :
>>> "analyzed", "similarity" : "my_similarity"}
>>>    }
>>>   }
>>>  }
>>> }
>>> '
>>>
>>>
>>>
>>> On Wed, Apr 2, 2014 at 12:03 PM, geantbrun <agin.p...@gmail.com> wrote:
>>>
>>>> In order to better understand the error, I copied your
>>>> NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
>>>> usr/share/elasticsearch/lib. I put these 2 files in a jar named
>>>> NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
>>>> tried to create the index with the same mapping as before (except that I
>>>> put "type" : "NormRemoval" in the settings of my_similarity.
>>>>
>>>> The result is the same:
>>>> {"error":"IndexCreationException[[exbd] failed to create index];
>>>> nested: NoClassSettingsException[Failed to load class setting [type]
>>>> with value [NormRemoval]]; nested: ClassNotFoundException[org.
>>>> elasticsearch.index.similarity.normremoval.
>>>> NormRemovalSimilarityProvider]; ","status":500}]
>>>>
>>>> I deleted the jar file just to see if the error is the same: yes it is.
>>>> It's like the new similarity is never found or loaded. Is it still working
>>>> without modifications on your side?
>>>> Cheers,
>>>> Patrick
>>>>
>>>>
>>>> Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :
>>>>>
>>>>> It has been a while since I used a custom similarity, but what you
>>>>> have looks right. Can you try a full class name instead?
>>>>> Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
>>>>> According to the error, it is looking for org.elasticsearch.index.si
>>>>> milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.
>>>>>
>>>>> --
>>>>> Ivan
>>>>>
>>>>>
>>>>> On Tue, Apr 1, 2014 at 7:00 AM, geantbrun <agin.p...@gmail.com> wrote:
>>>>>
>>>>>> Sure.
>>>>>>
>>>>>> {
>>>>>>  "settings" : {
>>>>>>   "index" : {
>>>>>>    "similarity" : {
>>>>>>     "my_similarity" : {
>>>>>>      "type" : "tfCappedSimilarity"
>>>>>>     }
>>>>>>    }
>>>>>>   }
>>>>>>  },
>>>>>>  "mappings" : {
>>>>>>   "post" : {
>>>>>>    "properties" : {
>>>>>>     "id" : { "type" : "long", "store" : "yes", "precision_step" : "0"
>>>>>> },
>>>>>>     "name" : { "type" : "string", "store" : "yes", "index" :
>>>>>> "analyzed"},
>>>>>>     "contents" : { "type" : "string", "store" : "no", "index" :
>>>>>> "analyzed", "similarity" : "my_similarity"}
>>>>>>    }
>>>>>>   }
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>> If I substitute tfCappedSimilarity for tfCapped in the mapping, the
>>>>>> error is the same except that provider is referred as
>>>>>> tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
>>>>>> yProvider.
>>>>>> Cheers,
>>>>>> Patrick
>>>>>>
>>>>>>
>>>>>> Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :
>>>>>>>
>>>>>>> Can you also post your mapping where you defined the similarity?
>>>>>>>
>>>>>>> --
>>>>>>> Ivan
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 31, 2014 at 10:36 AM, geantbrun <agin.p...@gmail.com>wrote:
>>>>>>>
>>>>>>>> I realize that I probably have to define the similarity property of
>>>>>>>> my field as "my_similarity" (and not as "tfCappedSimilarity") and 
>>>>>>>> define in
>>>>>>>> the settings my_similarity as being of type tfCappedSimilarity.
>>>>>>>> When I do that, I get the following error at the index/mapping
>>>>>>>> creation:
>>>>>>>>
>>>>>>>> {"error":"IndexCreationException[[exbd] failed to create index];
>>>>>>>> nested: NoClassSettingsException[Failed to load class setting
>>>>>>>> [type] with value [tfCappedSimilarity]]; nested: 
>>>>>>>> ClassNotFoundException[org.
>>>>>>>> elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil
>>>>>>>> aritySimilarityProvider]; ","status":500}]
>>>>>>>>
>>>>>>>> Note that the provider is referred in the error as
>>>>>>>> tfCappedSimilaritySimilarityProvider (similarity repeated 2
>>>>>>>> times). Is it normal?
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>> Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :
>>>>>>>>
>>>>>>>>> Hi Ivan,
>>>>>>>>> I followed your instructions but it does not seem to work, I must
>>>>>>>>> be wrong somewhere. I created the jar file from the following two java
>>>>>>>>> files, could you tell me if they are ok?
>>>>>>>>>
>>>>>>>>> tfCappedSimilarity.java
>>>>>>>>> ***************************
>>>>>>>>> package org.elasticsearch.index.similarity;
>>>>>>>>>
>>>>>>>>> import org.apache.lucene.search.similarities.DefaultSimilarity;
>>>>>>>>> import org.elasticsearch.common.logging.ESLogger;
>>>>>>>>> import org.elasticsearch.common.logging.Loggers;
>>>>>>>>>
>>>>>>>>> public class tfCappedSimilarity extends DefaultSimilarity {
>>>>>>>>>
>>>>>>>>>         private ESLogger logger;
>>>>>>>>>
>>>>>>>>>         public tfCappedSimilarity() {
>>>>>>>>>                 logger = Loggers.getLogger(getClass());
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         /**
>>>>>>>>>          * Capped tf value
>>>>>>>>>          */
>>>>>>>>>         @Override
>>>>>>>>>         public float tf(float freq) {
>>>>>>>>>                 return (float)Math.sqrt(Math.min(9, freq));
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> tfCappedSimilarityProvider.java
>>>>>>>>> *************************************
>>>>>>>>> package org.elasticsearch.index.similarity;
>>>>>>>>>
>>>>>>>>> import org.elasticsearch.common.inject.Inject;
>>>>>>>>> import org.elasticsearch.common.inject.assistedinject.Assisted;
>>>>>>>>> import org.elasticsearch.common.settings.Settings;
>>>>>>>>>
>>>>>>>>> public class tfCappedSimilarityProvider extends
>>>>>>>>> AbstractSimilarityProvider {
>>>>>>>>>
>>>>>>>>>         private tfCappedSimilarity similarity;
>>>>>>>>>
>>>>>>>>>         @Inject
>>>>>>>>>         public tfCappedSimilarityProvider(@Assisted String name,
>>>>>>>>> @Assisted Settings settings) {
>>>>>>>>>                  super(name);
>>>>>>>>>                 this.similarity = new tfCappedSimilarity();
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         /**
>>>>>>>>>          * {@inheritDoc}
>>>>>>>>>          */
>>>>>>>>>         @Override
>>>>>>>>>         public tfCappedSimilarity get() {
>>>>>>>>>                 return similarity;
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In my mapping, I define the similarity property of my field as
>>>>>>>>> tfCappedSimilarity, is it ok?
>>>>>>>>>
>>>>>>>>> What makes me say that it does not work: I insert a doc with a
>>>>>>>>> word repeated 16 times in my field. When I do a search with that 
>>>>>>>>> word, the
>>>>>>>>> result shows a tf of 4 (square root of 16) and not 3 as I was 
>>>>>>>>> expecting, Is
>>>>>>>>> there a way to know if the similarity was loaded or not (maybe in a 
>>>>>>>>> log
>>>>>>>>> file?).
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Patrick
>>>>>>>>>
>>>>>>>>> Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :
>>>>>>>>>>
>>>>>>>>>> I updated my gist to illustrate the SimilarityProvider that goes
>>>>>>>>>> along with it. Similarities are easier to add to Elasticsearch than 
>>>>>>>>>> most
>>>>>>>>>> plugins. You just need to compile the two files into a jar and then 
>>>>>>>>>> add
>>>>>>>>>> that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). 
>>>>>>>>>> The
>>>>>>>>>> code will scan for every SimilarityProvider defined and load it.
>>>>>>>>>>
>>>>>>>>>> You then mapping the similarity to a field: http://www.
>>>>>>>>>> elasticsearch.org/guide/en/elasticsearch/reference/
>>>>>>>>>> current/mapping-core-types.html#_configuring_similarity_per_field
>>>>>>>>>>
>>>>>>>>>> Note that you cannot change the similarity of a field dynamically.
>>>>>>>>>>
>>>>>>>>>> Ivan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/referenc
>>>>>>>>>> e/current/mapping-core-types.html#_configuring_similarity_pe
>>>>>>>>>> r_field
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 26, 2014 at 12:49 PM, geantbrun 
>>>>>>>>>> <agin.p...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Britta is looping over words that are passed as parameters. It's
>>>>>>>>>>> easy to implement her script for a simple query but what about 
>>>>>>>>>>> boolean
>>>>>>>>>>> querys? In my understanding (but I could be wrong of course), I 
>>>>>>>>>>> would have
>>>>>>>>>>> to parse the query to call the script with each sub-clause, am I 
>>>>>>>>>>> wrong?
>>>>>>>>>>>
>>>>>>>>>>> I prefer your custom similarity alternative. Again, sorry for
>>>>>>>>>>> the silly question (newbie!) but where do you put your java file? 
>>>>>>>>>>> Is it the
>>>>>>>>>>> only thing that is needed (except for the modification in the 
>>>>>>>>>>> mapping)?
>>>>>>>>>>> cheers,
>>>>>>>>>>> Patrick
>>>>>>>>>>>
>>>>>>>>>>> Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>> I am still on a version of Elasticsearch that does not have
>>>>>>>>>>>> access to the new scoring capabilities, so I cannot test out any 
>>>>>>>>>>>> scripts.
>>>>>>>>>>>> The non normalized term frequency should be the line:
>>>>>>>>>>>> tf = _index[field][word].tf()
>>>>>>>>>>>>
>>>>>>>>>>>> If that is the case, you could substitute that line with
>>>>>>>>>>>> something like:
>>>>>>>>>>>> tf = Math.min(10, _index[field][word].tf())
>>>>>>>>>>>>
>>>>>>>>>>>> As a stated before, I am used to using Similarities, so I find
>>>>>>>>>>>> the example easier. Here is a custom similarity that I used in
>>>>>>>>>>>> Elasticsearch (removes any norms that are indexed):
>>>>>>>>>>>> https://gist.github.com/brusic/9786587
>>>>>>>>>>>>
>>>>>>>>>>>>  The second part would be the tf() method you would need to
>>>>>>>>>>>> implement instead of decodeNormValue I used.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Ivan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824
>>>>>>>> 3-4aea-918a-e4e4e9588aaf%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40goo
>>>>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCkOMr1-48mgnFPTs-38GswX-OfU%3DgBLY9Qr3n1Z-9p0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to