Re: Relevancy sorting of result returned

cheehoo84 Sun, 06 Apr 2014 00:38:24 -0700

Hi Ivan,

Because I wanted the similiar result sorted in this way :


1. Be happy
2. Be happy
3. Happy ways

Currently it is sorted :
1. Be happy
2. Happy ways
3. Be happy
 
Due to that it return the same scoring. Any suggestion ?

Thanks

> On 6 Apr, 2014, at 4:24 am, Ivan Brusic <[email protected]> wrote:
> 
> Lucene will indeed, by default, give a higher score to shorter text, but the 
> "shortness" is the number of tokens, not the number of characters. In your 
> last example, each field has two tokens, so the length is the same. The term 
> frequency is also the same for each document ("happy" appears once) and the 
> inverse document frequency is the same (always the case with single term 
> queries), so the score will be exactly the same for every document. Why 
> should the scoring by any different?
> 
> Cheers,
> 
> Ivan
> 
> 
> 
>> On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum <[email protected]> wrote:
>> Hi Ivan,
>> 
>> Since i not sure how analyzer with stopwords can be set in the query itself. 
>> I tried to set the stopwords="_none_" via
>> index and its mapping : 
>> 
>> Index settings: 
>> 
>> {
>>     "jdbc_dev": {
>>         "settings": {
>>             "index.analysis.analyzer.string_lowercase.filter": "lowercase",
>>             "index.number_of_replicas": "1",
>>             "index.analysis.analyzer.string_lowercase.tokenizer": "keyword",
>>             "index.number_of_shards": "5",
>>             "index.version.created": "900199",
>>             "index.analysis.analyzer.standard.type": "standard",
>>             "index.analysis.analyzer.standard.stopwords": "_none_"
>>         }
>>     }
>> }
>> 
>> 
>> Type Mapping :
>> 
>> {
>>     "media": {
>>         "properties": {
>>             "AUDIO": {
>>                 "type": "string"
>>             },
>>          ....
>>          "DISPLAY_NAME": {
>>                 "type": "string",
>>                 "analyzer": "standard"
>>             },
>>          ....
>>    }
>> }
>> 
>> 
>> Query : 
>> 
>> /media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary
>> 
>> {
>>   "from" : 0,
>>   "size" : 100,
>>   "explain" : true,
>>   "query" : {
>>    
>>     "filtered" : {
>>       "query" : {
>>          "multi_match": {
>>                      "query": "happy",
>>                      "fields": [ "DISPLAY_NAME" ]
>>     }
>>       },
>>       "filter" : {
>>         "query" : {
>>              "bool" : {
>>           "must" : {
>>             "term" : {
>>               "CHANNEL_ID" : "1"
>>             }
>>           }
>>         }
>>         }
>>       }
>>     }
>>   }
>>   
>> }
>> 
>> 
>> Result : 
>> 
>> 1)
>>  "_shard": 4,
>>                 "_node": "xsGVhtTnThaG57_mJdMtxg",
>>                 "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "127413",
>>                 "_score": 6.614289,
>>                 "_source": {
>>                     "DISPLAY_NAME": "Be Happy",
>>                 ,
>>                 "_explanation": {
>>                     "value": 6.614289,
>>                     "description": "weight(DISPLAY_NAME:happy in 6485) 
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 6485, product 
>> of:",
>>                             "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with freq 
>> of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93, 
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=6485)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>> 
>> 
>> 2) 
>>  "_shard": 4,
>>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>>                 "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "72253",
>>                 "_score": 6.614289,
>>                 "_source": {
>>                     "DISPLAY_NAME": "Happy Ways",
>>                   "_explanation": {
>>                     "value": 6.614289,
>>                     "description": "weight(DISPLAY_NAME:happy in 1102) 
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 1102, product 
>> of:",
>>                             "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with freq 
>> of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93, 
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=1102)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>> 
>> 
>> 3)
>>  "_shard": 4,
>>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>>                 "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "127413",
>>                 "_score": 6.614289,
>>                 "_source": {
>>                     "DISPLAY_NAME": "Be Happy",
>>                  "_explanation": {
>>                     "value": 6.614289,
>>                     "description": "weight(DISPLAY_NAME:happy in 7277) 
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 7277, product 
>> of:",
>>                             "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with freq 
>> of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93, 
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=7277)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>>                
>> 
>> Notice that from 1,2,3 items the scores are the same 6.614289 even though 
>> the DISPLAY_NAME is different
>> 1) Be Happy
>> 2) Happy Ways
>> 3) Be Happy
>> 
>> It looks like it doesn't take into consideration the number of 
>> character/length when it compute the score. I remember somewhere in the 
>> document indicate that by default the algorithm should give higher score to 
>> the document that have shorter text on the searched field however this 
>> doesn't seem like the case. Also i didn't manually disable the norm. 
>> 
>> Any suggestion that i could circumvent this issue ? 
>> 
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Relevancy sorting of result returned

Reply via email to