Re: Relevancy sorting of result returned

Ivan Brusic Sat, 05 Apr 2014 13:25:21 -0700

Lucene will indeed, by default, give a higher score to shorter text, but
the "shortness" is the number of tokens, not the number of characters. In
your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?


Cheers,

Ivan



On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum <[email protected]> wrote:

> Hi Ivan,
>
> Since i not sure how analyzer with stopwords can be set in the query
> itself. I tried to set the stopwords="_none_" via
> index and its mapping :
>
> *Index settings: *
>
> {
>     "jdbc_dev": {
>         "settings": {
>             "index.analysis.analyzer.string_lowercase.filter": "lowercase",
>             "index.number_of_replicas": "1",
>             "index.analysis.analyzer.string_lowercase.tokenizer":
> "keyword",
>             "index.number_of_shards": "5",
>             "index.version.created": "900199",
>          *   "index.analysis.analyzer.standard.type": "standard",*
> *            "index.analysis.analyzer.standard.stopwords": "_none_"*
>         }
>     }
> }
>
>
> *Type Mapping :*
>
> {
>     "media": {
>         "properties": {
>             "AUDIO": {
>                 "type": "string"
>             },
>          ....
>          "DISPLAY_NAME": {
>                 "type": "string",
>               *  "analyzer": "standard"*
>             },
>          ....
>    }
> }
>
>
> *Query : *
>
> /media/_search?pretty=&search_type=dfs_query_then_fetch&
> preference=_primary
>
> {
>   "from" : 0,
>   "size" : 100,
>   "explain" : true,
>   "query" : {
>
>     "filtered" : {
>       "query" : {
>          "multi_match": {
>        "query": "happy",
>        "fields": [ "DISPLAY_NAME" ]
>     }
>       },
>       "filter" : {
>         "query" : {
>           "bool" : {
>           "must" : {
>             "term" : {
>               "CHANNEL_ID" : "1"
>             }
>           }
>         }
>         }
>       }
>     }
>   }
>
> }
>
>
> *Result : *
>
> 1)
>  "_shard": *4*,
>                 "_node": "xsGVhtTnThaG57_mJdMtxg",
>                 "_index": "jdbc_dev",
>                 "_type": "media",
>                 "_id": "127413",
>                 "_score":* 6.614289*,
>                 "_source": {
>                     "DISPLAY_NAME": "*Be Happy*",
>                 ,
>                 "_explanation": {
>                     "value": 6.614289,
>                     "description": "weight(DISPLAY_NAME:happy in 6485)
> [PerFieldSimilarity], result of:",
>                     "details": [
>                         {
>                             "value": 6.614289,
>                             "description": "fieldWeight in 6485, product
> of:",
>                             "details": [
>                                 {
>                                     "value": 1,
>                                     "description": "tf(freq=1.0), with
> freq of:",
>                                     "details": [
>                                         {
>                                             "value": 1,
>                                             "description": "termFreq=1.0"
>                                         }
>                                     ]
>                                 },
>                                 {
>                                     "value": 10.582862,
>                                     "description": "idf(docFreq=93,
> maxDocs=1364306)"
>                                 },
>                                 {
>                                     "value": 0.625,
>                                     "description": "fieldNorm(doc=6485)"
>                                 }
>                             ]
>                         }
>                     ]
>                 }
>
>
> 2)
>  "_shard": *4*,
>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>                 "_index": "jdbc_dev",
>                 "_type": "media",
>                 "_id": "72253",
>                 "_score": *6.614289*,
>                 "_source": {
>                     "DISPLAY_NAME": *"Happy Ways*",
>                   "_explanation": {
>                     "value": 6.614289,
>                     "description": "weight(DISPLAY_NAME:happy in 1102)
> [PerFieldSimilarity], result of:",
>                     "details": [
>                         {
>                             "value": 6.614289,
>                             "description": "fieldWeight in 1102, product
> of:",
>                             "details": [
>                                 {
>                                     "value": 1,
>                                     "description": "tf(freq=1.0), with
> freq of:",
>                                     "details": [
>                                         {
>                                             "value": 1,
>                                             "description": "termFreq=1.0"
>                                         }
>                                     ]
>                                 },
>                                 {
>                                     "value": 10.582862,
>                                     "description": "idf(docFreq=93,
> maxDocs=1364306)"
>                                 },
>                                 {
>                                     "value": 0.625,
>                                     "description": "fieldNorm(doc=1102)"
>                                 }
>                             ]
>                         }
>                     ]
>                 }
>
>
> 3)
>  "_shard":* 4*,
>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>                 "_index": "jdbc_dev",
>                 "_type": "media",
>                 "_id": "127413",
>                 "_score": 6.614289,
>                 "_source": {
>                     "DISPLAY_NAME": "*Be Happy*",
>                  "_explanation": {
>                     "value": *6.614289*,
>                     "description": "weight(DISPLAY_NAME:happy in 7277)
> [PerFieldSimilarity], result of:",
>                     "details": [
>                         {
>                             "value": 6.614289,
>                             "description": "fieldWeight in 7277, product
> of:",
>                             "details": [
>                                 {
>                                     "value": 1,
>                                     "description": "tf(freq=1.0), with
> freq of:",
>                                     "details": [
>                                         {
>                                             "value": 1,
>                                             "description": "termFreq=1.0"
>                                         }
>                                     ]
>                                 },
>                                 {
>                                     "value": 10.582862,
>                                     "description": "idf(docFreq=93,
> maxDocs=1364306)"
>                                 },
>                                 {
>                                     "value": 0.625,
>                                     "description": "fieldNorm(doc=7277)"
>                                 }
>                             ]
>                         }
>                     ]
>                 }
>
>
> Notice that from 1,2,3 items the scores are the same *6.614289* even
> though the DISPLAY_NAME is different
> 1) Be Happy
> 2) Happy Ways
> 3) Be Happy
>
> It looks like it doesn't take into consideration the number of
> character/length when it compute the score. I remember somewhere in the
> document indicate that by default the algorithm should give higher score to
> the document that have shorter text on the searched field however this
> doesn't seem like the case. Also i didn't manually disable the norm.
>
> Any suggestion that i could circumvent this issue ?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Relevancy sorting of result returned

Reply via email to