Re: Relevancy sorting of result returned

Ivan Brusic Sun, 06 Apr 2014 16:14:33 -0700

You can index the number of characters in your string into a new field and
then do a secondary sort on this field.


Are you testing against real data or only against some test set? The Lucene
scoring model will improve with the addition of more documents. As more
documents are added, the term frequencies and inverse document frequencies
start to diverge and contribute more to the scoring. You will not have many
documents with the same score.

-- 
Ivan


On Sun, Apr 6, 2014 at 12:38 AM, <[email protected]> wrote:

>
> Hi Ivan,
>
> Because I wanted the similiar result sorted in this way :
>
> 1. Be happy
> 2. Be happy
> 3. Happy ways
>
> Currently it is sorted :
> 1. Be happy
> 2. Happy ways
> 3. Be happy
>
> Due to that it return the same scoring. Any suggestion ?
>
> Thanks
>
> On 6 Apr, 2014, at 4:24 am, Ivan Brusic <[email protected]> wrote:
>
> Lucene will indeed, by default, give a higher score to shorter text, but
> the "shortness" is the number of tokens, not the number of characters. In
> your last example, each field has two tokens, so the length is the same.
> The term frequency is also the same for each document ("happy" appears
> once) and the inverse document frequency is the same (always the case with
> single term queries), so the score will be exactly the same for every
> document. Why should the scoring by any different?
>
> Cheers,
>
> Ivan
>
>
>
> On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum <[email protected]> wrote:
>
>> Hi Ivan,
>>
>> Since i not sure how analyzer with stopwords can be set in the query
>> itself. I tried to set the stopwords="_none_" via
>> index and its mapping :
>>
>> *Index settings: *
>>
>> {
>>     "jdbc_dev": {
>>         "settings": {
>>             "index.analysis.analyzer.string_lowercase.filter":
>> "lowercase",
>>             "index.number_of_replicas": "1",
>>             "index.analysis.analyzer.string_lowercase.tokenizer":
>> "keyword",
>>             "index.number_of_shards": "5",
>>             "index.version.created": "900199",
>>          *   "index.analysis.analyzer.standard.type": "standard",*
>> *            "index.analysis.analyzer.standard.stopwords": "_none_"*
>>         }
>>     }
>> }
>>
>>
>> *Type Mapping :*
>>
>> {
>>     "media": {
>>         "properties": {
>>             "AUDIO": {
>>                 "type": "string"
>>             },
>>          ....
>>          "DISPLAY_NAME": {
>>                 "type": "string",
>>               *  "analyzer": "standard"*
>>             },
>>          ....
>>    }
>> }
>>
>>
>> *Query : *
>>
>> /media/_search?pretty=&search_type=dfs_query_then_fetch&
>> preference=_primary
>>
>> {
>>   "from" : 0,
>>   "size" : 100,
>>   "explain" : true,
>>   "query" : {
>>
>>     "filtered" : {
>>       "query" : {
>>          "multi_match": {
>>        "query": "happy",
>>        "fields": [ "DISPLAY_NAME" ]
>>     }
>>       },
>>       "filter" : {
>>         "query" : {
>>           "bool" : {
>>           "must" : {
>>             "term" : {
>>               "CHANNEL_ID" : "1"
>>             }
>>           }
>>         }
>>         }
>>       }
>>     }
>>   }
>>
>> }
>>
>>
>> *Result : *
>>
>> 1)
>>  "_shard": *4*,
>>                 "_node": "xsGVhtTnThaG57_mJdMtxg",
>>                 "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "127413",
>>                 "_score":* 6.614289*,
>>                 "_source": {
>>                     "DISPLAY_NAME": "*Be Happy*",
>>                 ,
>>                 "_explanation": {
>>                     "value": 6.614289,
>>                     "description": "weight(DISPLAY_NAME:happy in 6485)
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 6485, product
>> of:",
>>                              "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with
>> freq of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93,
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=6485)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>>
>>
>> 2)
>>  "_shard": *4*,
>>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>>                  "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "72253",
>>                 "_score": *6.614289*,
>>                 "_source": {
>>                     "DISPLAY_NAME": *"Happy Ways*",
>>                   "_explanation": {
>>                     "value": 6.614289,
>>                     "description": "weight(DISPLAY_NAME:happy in 1102)
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 1102, product
>> of:",
>>                             "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with
>> freq of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93,
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=1102)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>>
>>
>> 3)
>>  "_shard":* 4*,
>>                 "_node": "UOjX2lxhR6mzfjHHmTm3cQ",
>>                  "_index": "jdbc_dev",
>>                 "_type": "media",
>>                 "_id": "127413",
>>                 "_score": 6.614289,
>>                  "_source": {
>>                     "DISPLAY_NAME": "*Be Happy*",
>>                  "_explanation": {
>>                     "value": *6.614289*,
>>                     "description": "weight(DISPLAY_NAME:happy in 7277)
>> [PerFieldSimilarity], result of:",
>>                     "details": [
>>                         {
>>                             "value": 6.614289,
>>                             "description": "fieldWeight in 7277, product
>> of:",
>>                              "details": [
>>                                 {
>>                                     "value": 1,
>>                                     "description": "tf(freq=1.0), with
>> freq of:",
>>                                     "details": [
>>                                         {
>>                                             "value": 1,
>>                                             "description": "termFreq=1.0"
>>                                         }
>>                                     ]
>>                                 },
>>                                 {
>>                                     "value": 10.582862,
>>                                     "description": "idf(docFreq=93,
>> maxDocs=1364306)"
>>                                 },
>>                                 {
>>                                     "value": 0.625,
>>                                     "description": "fieldNorm(doc=7277)"
>>                                 }
>>                             ]
>>                         }
>>                     ]
>>                 }
>>
>>
>> Notice that from 1,2,3 items the scores are the same *6.614289* even
>> though the DISPLAY_NAME is different
>> 1) Be Happy
>> 2) Happy Ways
>> 3) Be Happy
>>
>> It looks like it doesn't take into consideration the number of
>> character/length when it compute the score. I remember somewhere in the
>> document indicate that by default the algorithm should give higher score to
>> the document that have shorter text on the searched field however this
>> doesn't seem like the case. Also i didn't manually disable the norm.
>>
>> Any suggestion that i could circumvent this issue ?
>>
>>
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com<https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Relevancy sorting of result returned

Reply via email to