I added a simple Maven pom to the gist: https://gist.github.com/brusic/9786587#file-pom-xml
Easiest thing to do is download Maven (if you do not have it) and use it take care handling the dependencies and build a jar if you simple execute: mvn package Since Elasticsearch already comes bundle with the correct jars, you can also add those to your classpath instead. I think you only need Lucene core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the question marks for the correct version. I am not on Elasticsearch, so I do not know offhand which version of Lucene is packaged. -- Ivan On Thu, Apr 3, 2014 at 7:44 AM, geantbrun <agin.patr...@gmail.com> wrote: > Ivan, > Sorry but I realize (I'm totally unaware of Java) that I skipped the java > compile step (I simply put the java files in a jar file with jar cf). The > problem now is that executing : > > javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar > > generates errors, the first one being: > > package org.apache.lucene.search.similarities does not exist > > Googled it but found nothing. Any idea? > Patrick > > P.S. I installed elasticsearch following the easy > way<https://gist.github.com/wingdspur/2026107>(dpkg the deb file) > > Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit : > >> Thanks again for your great help Ivan. Does not work for me. When I >> substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or >> simply by BM25), it works. Is it possible that I put my jar file in the >> wrong directory (usr/share/elasticsearch/lib)? Is it necessary to >> *register* somewhere the new classes I define before restarting service? >> Cheers, >> Patrick >> >> Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit : >>> >>> Are you using a full class name? I have no problems with >>> >>> curl -XPOST 'http://localhost:9200/sim/' -d ' >>> { >>> "settings" : { >>> "similarity" : { >>> "my_similarity" : { >>> "type" : "org.elasticsearch.index.similarity. >>> NormRemovalSimilarityProvider" >>> } >>> } >>> }, >>> "mappings" : { >>> "post" : { >>> "properties" : { >>> "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, >>> "name" : { "type" : "string", "store" : "yes", "index" : "analyzed"}, >>> "contents" : { "type" : "string", "store" : "no", "index" : >>> "analyzed", "similarity" : "my_similarity"} >>> } >>> } >>> } >>> } >>> ' >>> >>> >>> >>> On Wed, Apr 2, 2014 at 12:03 PM, geantbrun <agin.p...@gmail.com> wrote: >>> >>>> In order to better understand the error, I copied your >>>> NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in >>>> usr/share/elasticsearch/lib. I put these 2 files in a jar named >>>> NormRemovalSimilarity.jar. After restarting the elasticsearch service, I >>>> tried to create the index with the same mapping as before (except that I >>>> put "type" : "NormRemoval" in the settings of my_similarity. >>>> >>>> The result is the same: >>>> {"error":"IndexCreationException[[exbd] failed to create index]; >>>> nested: NoClassSettingsException[Failed to load class setting [type] >>>> with value [NormRemoval]]; nested: ClassNotFoundException[org. >>>> elasticsearch.index.similarity.normremoval. >>>> NormRemovalSimilarityProvider]; ","status":500}] >>>> >>>> I deleted the jar file just to see if the error is the same: yes it is. >>>> It's like the new similarity is never found or loaded. Is it still working >>>> without modifications on your side? >>>> Cheers, >>>> Patrick >>>> >>>> >>>> Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit : >>>>> >>>>> It has been a while since I used a custom similarity, but what you >>>>> have looks right. Can you try a full class name instead? >>>>> Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider. >>>>> According to the error, it is looking for org.elasticsearch.index.si >>>>> milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider. >>>>> >>>>> -- >>>>> Ivan >>>>> >>>>> >>>>> On Tue, Apr 1, 2014 at 7:00 AM, geantbrun <agin.p...@gmail.com> wrote: >>>>> >>>>>> Sure. >>>>>> >>>>>> { >>>>>> "settings" : { >>>>>> "index" : { >>>>>> "similarity" : { >>>>>> "my_similarity" : { >>>>>> "type" : "tfCappedSimilarity" >>>>>> } >>>>>> } >>>>>> } >>>>>> }, >>>>>> "mappings" : { >>>>>> "post" : { >>>>>> "properties" : { >>>>>> "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" >>>>>> }, >>>>>> "name" : { "type" : "string", "store" : "yes", "index" : >>>>>> "analyzed"}, >>>>>> "contents" : { "type" : "string", "store" : "no", "index" : >>>>>> "analyzed", "similarity" : "my_similarity"} >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> If I substitute tfCappedSimilarity for tfCapped in the mapping, the >>>>>> error is the same except that provider is referred as >>>>>> tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit >>>>>> yProvider. >>>>>> Cheers, >>>>>> Patrick >>>>>> >>>>>> >>>>>> Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit : >>>>>>> >>>>>>> Can you also post your mapping where you defined the similarity? >>>>>>> >>>>>>> -- >>>>>>> Ivan >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 31, 2014 at 10:36 AM, geantbrun <agin.p...@gmail.com>wrote: >>>>>>> >>>>>>>> I realize that I probably have to define the similarity property of >>>>>>>> my field as "my_similarity" (and not as "tfCappedSimilarity") and >>>>>>>> define in >>>>>>>> the settings my_similarity as being of type tfCappedSimilarity. >>>>>>>> When I do that, I get the following error at the index/mapping >>>>>>>> creation: >>>>>>>> >>>>>>>> {"error":"IndexCreationException[[exbd] failed to create index]; >>>>>>>> nested: NoClassSettingsException[Failed to load class setting >>>>>>>> [type] with value [tfCappedSimilarity]]; nested: >>>>>>>> ClassNotFoundException[org. >>>>>>>> elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil >>>>>>>> aritySimilarityProvider]; ","status":500}] >>>>>>>> >>>>>>>> Note that the provider is referred in the error as >>>>>>>> tfCappedSimilaritySimilarityProvider (similarity repeated 2 >>>>>>>> times). Is it normal? >>>>>>>> Patrick >>>>>>>> >>>>>>>> Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit : >>>>>>>> >>>>>>>>> Hi Ivan, >>>>>>>>> I followed your instructions but it does not seem to work, I must >>>>>>>>> be wrong somewhere. I created the jar file from the following two java >>>>>>>>> files, could you tell me if they are ok? >>>>>>>>> >>>>>>>>> tfCappedSimilarity.java >>>>>>>>> *************************** >>>>>>>>> package org.elasticsearch.index.similarity; >>>>>>>>> >>>>>>>>> import org.apache.lucene.search.similarities.DefaultSimilarity; >>>>>>>>> import org.elasticsearch.common.logging.ESLogger; >>>>>>>>> import org.elasticsearch.common.logging.Loggers; >>>>>>>>> >>>>>>>>> public class tfCappedSimilarity extends DefaultSimilarity { >>>>>>>>> >>>>>>>>> private ESLogger logger; >>>>>>>>> >>>>>>>>> public tfCappedSimilarity() { >>>>>>>>> logger = Loggers.getLogger(getClass()); >>>>>>>>> } >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Capped tf value >>>>>>>>> */ >>>>>>>>> @Override >>>>>>>>> public float tf(float freq) { >>>>>>>>> return (float)Math.sqrt(Math.min(9, freq)); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> tfCappedSimilarityProvider.java >>>>>>>>> ************************************* >>>>>>>>> package org.elasticsearch.index.similarity; >>>>>>>>> >>>>>>>>> import org.elasticsearch.common.inject.Inject; >>>>>>>>> import org.elasticsearch.common.inject.assistedinject.Assisted; >>>>>>>>> import org.elasticsearch.common.settings.Settings; >>>>>>>>> >>>>>>>>> public class tfCappedSimilarityProvider extends >>>>>>>>> AbstractSimilarityProvider { >>>>>>>>> >>>>>>>>> private tfCappedSimilarity similarity; >>>>>>>>> >>>>>>>>> @Inject >>>>>>>>> public tfCappedSimilarityProvider(@Assisted String name, >>>>>>>>> @Assisted Settings settings) { >>>>>>>>> super(name); >>>>>>>>> this.similarity = new tfCappedSimilarity(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * {@inheritDoc} >>>>>>>>> */ >>>>>>>>> @Override >>>>>>>>> public tfCappedSimilarity get() { >>>>>>>>> return similarity; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> In my mapping, I define the similarity property of my field as >>>>>>>>> tfCappedSimilarity, is it ok? >>>>>>>>> >>>>>>>>> What makes me say that it does not work: I insert a doc with a >>>>>>>>> word repeated 16 times in my field. When I do a search with that >>>>>>>>> word, the >>>>>>>>> result shows a tf of 4 (square root of 16) and not 3 as I was >>>>>>>>> expecting, Is >>>>>>>>> there a way to know if the similarity was loaded or not (maybe in a >>>>>>>>> log >>>>>>>>> file?). >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Patrick >>>>>>>>> >>>>>>>>> Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit : >>>>>>>>>> >>>>>>>>>> I updated my gist to illustrate the SimilarityProvider that goes >>>>>>>>>> along with it. Similarities are easier to add to Elasticsearch than >>>>>>>>>> most >>>>>>>>>> plugins. You just need to compile the two files into a jar and then >>>>>>>>>> add >>>>>>>>>> that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). >>>>>>>>>> The >>>>>>>>>> code will scan for every SimilarityProvider defined and load it. >>>>>>>>>> >>>>>>>>>> You then mapping the similarity to a field: http://www. >>>>>>>>>> elasticsearch.org/guide/en/elasticsearch/reference/ >>>>>>>>>> current/mapping-core-types.html#_configuring_similarity_per_field >>>>>>>>>> >>>>>>>>>> Note that you cannot change the similarity of a field dynamically. >>>>>>>>>> >>>>>>>>>> Ivan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/referenc >>>>>>>>>> e/current/mapping-core-types.html#_configuring_similarity_pe >>>>>>>>>> r_field >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 26, 2014 at 12:49 PM, geantbrun >>>>>>>>>> <agin.p...@gmail.com>wrote: >>>>>>>>>> >>>>>>>>>>> Britta is looping over words that are passed as parameters. It's >>>>>>>>>>> easy to implement her script for a simple query but what about >>>>>>>>>>> boolean >>>>>>>>>>> querys? In my understanding (but I could be wrong of course), I >>>>>>>>>>> would have >>>>>>>>>>> to parse the query to call the script with each sub-clause, am I >>>>>>>>>>> wrong? >>>>>>>>>>> >>>>>>>>>>> I prefer your custom similarity alternative. Again, sorry for >>>>>>>>>>> the silly question (newbie!) but where do you put your java file? >>>>>>>>>>> Is it the >>>>>>>>>>> only thing that is needed (except for the modification in the >>>>>>>>>>> mapping)? >>>>>>>>>>> cheers, >>>>>>>>>>> Patrick >>>>>>>>>>> >>>>>>>>>>> Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit : >>>>>>>>>>>> >>>>>>>>>>>> I am still on a version of Elasticsearch that does not have >>>>>>>>>>>> access to the new scoring capabilities, so I cannot test out any >>>>>>>>>>>> scripts. >>>>>>>>>>>> The non normalized term frequency should be the line: >>>>>>>>>>>> tf = _index[field][word].tf() >>>>>>>>>>>> >>>>>>>>>>>> If that is the case, you could substitute that line with >>>>>>>>>>>> something like: >>>>>>>>>>>> tf = Math.min(10, _index[field][word].tf()) >>>>>>>>>>>> >>>>>>>>>>>> As a stated before, I am used to using Similarities, so I find >>>>>>>>>>>> the example easier. Here is a custom similarity that I used in >>>>>>>>>>>> Elasticsearch (removes any norms that are indexed): >>>>>>>>>>>> https://gist.github.com/brusic/9786587 >>>>>>>>>>>> >>>>>>>>>>>> The second part would be the tf() method you would need to >>>>>>>>>>>> implement instead of decodeNormValue I used. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> Ivan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elasticsearch" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824 >>>>>>>> 3-4aea-918a-e4e4e9588aaf%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40goo >>>>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7% >>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCkOMr1-48mgnFPTs-38GswX-OfU%3DgBLY9Qr3n1Z-9p0w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.