Re: Modifying scoring algorithm during search operations

Hiro Gangwani Mon, 27 Jan 2014 22:32:57 -0800

Hi Ivan,
Thanks for the reply. We tried using norms.enabled property and it is 
working fine. But what we have observed is this attribute works only on 
string types. In our application we are indexing the word (.doc,.docx) and 
pdf documents and performing test based search from document content. When 
we define the norm.enabled for attachments types, normalization is not 
working and size of document is being considered while calculating the 
score.


Please suggest how do resolve this issue for attachment types. 

Code to create the index for attachment types
---
XContentBuilder map = XContentFactory.jsonBuilder().startObject()
        .startObject(idxType)
         .startObject("properties")         
             .startObject("file")             
             .field("type", "attachement")
             .field("norms.enabled", false)
             .startObject("fields")
             .startObject("refid")
             .field("store", "yes")
             .endObject()
             .startObject("name")
             .field("store", "yes")
             .endObject()
             .startObject("itexp")
             .field("store", "yes")
             .endObject() 
             .startObject("totalexp")
             .field("store", "yes")
             .endObject()
             .endObject()
             .endObject()
            .endObject()
        .endObject();
---



Hiro


On Monday, 27 January 2014 23:50:41 UTC+5:30, Ivan Brusic wrote:
>
> For the third rule, you can omit index norms for a field which will 
> prevent length normalization. See [1]. The option is either 
> called omit_norms or norms.enabled depending on your version.
>
> For the second rule, it is slightly more complicated. You can define your 
> own custom similarity [2] that dictates how the TF, IDF and norms are used. 
> You simply extends Lucene's DefaultSimilarity (of TDIDFSimilarity) and at 
> it to elasticsearch's classpath.
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
> [2] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html
>
> -- 
> Ivan
>
>
> On Sun, Jan 26, 2014 at 11:12 PM, Hiro Gangwani 
> <[email protected]<javascript:>
> > wrote:
>
>> Dear Team,
>>
>> I have been looking at search algorithm being used in elastic search and 
>> found following set of rules which are applied while calculating the score 
>> (Boolean Model)
>>
>>
>>    - more occurrences in the document are preferred
>>    - terms rarer in the corpus are preferred
>>    - shorter documents are more heavily weighted
>>    - other functions used to adjust score, boosts, etc.
>>
>> In my application we are doing text based search across set of word 
>> documents. We would like to assign the higher scroe to documents having 
>> more occurances and show at the top irrespective of size of document. 
>> Primarily our application is recruitment system where is search is based 
>> upon skill sets. So our business team wants to show the resumes having more 
>> occurrences of search key words at top irrespective of size and rare terms.
>> Is there any mechanism to ignore second and third rules as listed below 
>> and calculate the score based upon More occurrences condition only. We are 
>> executing search operations using Java API. Please let me know is it 
>> possible to achieve the same and if yes how?
>>
>> Thanks in advance for suggesting solution.
>>
>> Hiro
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f6936b6f-ef7c-4497-b186-bdba28176d89%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f80933eb-1b68-4c6f-b073-39b78e3f45e9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Modifying scoring algorithm during search operations

Reply via email to