Hi Ivan, Because I wanted the similiar result sorted in this way :
1. Be happy 2. Be happy 3. Happy ways Currently it is sorted : 1. Be happy 2. Happy ways 3. Be happy Due to that it return the same scoring. Any suggestion ? Thanks > On 6 Apr, 2014, at 4:24 am, Ivan Brusic <[email protected]> wrote: > > Lucene will indeed, by default, give a higher score to shorter text, but the > "shortness" is the number of tokens, not the number of characters. In your > last example, each field has two tokens, so the length is the same. The term > frequency is also the same for each document ("happy" appears once) and the > inverse document frequency is the same (always the case with single term > queries), so the score will be exactly the same for every document. Why > should the scoring by any different? > > Cheers, > > Ivan > > > >> On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum <[email protected]> wrote: >> Hi Ivan, >> >> Since i not sure how analyzer with stopwords can be set in the query itself. >> I tried to set the stopwords="_none_" via >> index and its mapping : >> >> Index settings: >> >> { >> "jdbc_dev": { >> "settings": { >> "index.analysis.analyzer.string_lowercase.filter": "lowercase", >> "index.number_of_replicas": "1", >> "index.analysis.analyzer.string_lowercase.tokenizer": "keyword", >> "index.number_of_shards": "5", >> "index.version.created": "900199", >> "index.analysis.analyzer.standard.type": "standard", >> "index.analysis.analyzer.standard.stopwords": "_none_" >> } >> } >> } >> >> >> Type Mapping : >> >> { >> "media": { >> "properties": { >> "AUDIO": { >> "type": "string" >> }, >> .... >> "DISPLAY_NAME": { >> "type": "string", >> "analyzer": "standard" >> }, >> .... >> } >> } >> >> >> Query : >> >> /media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary >> >> { >> "from" : 0, >> "size" : 100, >> "explain" : true, >> "query" : { >> >> "filtered" : { >> "query" : { >> "multi_match": { >> "query": "happy", >> "fields": [ "DISPLAY_NAME" ] >> } >> }, >> "filter" : { >> "query" : { >> "bool" : { >> "must" : { >> "term" : { >> "CHANNEL_ID" : "1" >> } >> } >> } >> } >> } >> } >> } >> >> } >> >> >> Result : >> >> 1) >> "_shard": 4, >> "_node": "xsGVhtTnThaG57_mJdMtxg", >> "_index": "jdbc_dev", >> "_type": "media", >> "_id": "127413", >> "_score": 6.614289, >> "_source": { >> "DISPLAY_NAME": "Be Happy", >> , >> "_explanation": { >> "value": 6.614289, >> "description": "weight(DISPLAY_NAME:happy in 6485) >> [PerFieldSimilarity], result of:", >> "details": [ >> { >> "value": 6.614289, >> "description": "fieldWeight in 6485, product >> of:", >> "details": [ >> { >> "value": 1, >> "description": "tf(freq=1.0), with freq >> of:", >> "details": [ >> { >> "value": 1, >> "description": "termFreq=1.0" >> } >> ] >> }, >> { >> "value": 10.582862, >> "description": "idf(docFreq=93, >> maxDocs=1364306)" >> }, >> { >> "value": 0.625, >> "description": "fieldNorm(doc=6485)" >> } >> ] >> } >> ] >> } >> >> >> 2) >> "_shard": 4, >> "_node": "UOjX2lxhR6mzfjHHmTm3cQ", >> "_index": "jdbc_dev", >> "_type": "media", >> "_id": "72253", >> "_score": 6.614289, >> "_source": { >> "DISPLAY_NAME": "Happy Ways", >> "_explanation": { >> "value": 6.614289, >> "description": "weight(DISPLAY_NAME:happy in 1102) >> [PerFieldSimilarity], result of:", >> "details": [ >> { >> "value": 6.614289, >> "description": "fieldWeight in 1102, product >> of:", >> "details": [ >> { >> "value": 1, >> "description": "tf(freq=1.0), with freq >> of:", >> "details": [ >> { >> "value": 1, >> "description": "termFreq=1.0" >> } >> ] >> }, >> { >> "value": 10.582862, >> "description": "idf(docFreq=93, >> maxDocs=1364306)" >> }, >> { >> "value": 0.625, >> "description": "fieldNorm(doc=1102)" >> } >> ] >> } >> ] >> } >> >> >> 3) >> "_shard": 4, >> "_node": "UOjX2lxhR6mzfjHHmTm3cQ", >> "_index": "jdbc_dev", >> "_type": "media", >> "_id": "127413", >> "_score": 6.614289, >> "_source": { >> "DISPLAY_NAME": "Be Happy", >> "_explanation": { >> "value": 6.614289, >> "description": "weight(DISPLAY_NAME:happy in 7277) >> [PerFieldSimilarity], result of:", >> "details": [ >> { >> "value": 6.614289, >> "description": "fieldWeight in 7277, product >> of:", >> "details": [ >> { >> "value": 1, >> "description": "tf(freq=1.0), with freq >> of:", >> "details": [ >> { >> "value": 1, >> "description": "termFreq=1.0" >> } >> ] >> }, >> { >> "value": 10.582862, >> "description": "idf(docFreq=93, >> maxDocs=1364306)" >> }, >> { >> "value": 0.625, >> "description": "fieldNorm(doc=7277)" >> } >> ] >> } >> ] >> } >> >> >> Notice that from 1,2,3 items the scores are the same 6.614289 even though >> the DISPLAY_NAME is different >> 1) Be Happy >> 2) Happy Ways >> 3) Be Happy >> >> It looks like it doesn't take into consideration the number of >> character/length when it compute the score. I remember somewhere in the >> document indicate that by default the algorithm should give higher score to >> the document that have shorter text on the searched field however this >> doesn't seem like the case. Also i didn't manually disable the norm. >> >> Any suggestion that i could circumvent this issue ? >> >> >> > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com. For more options, visit https://groups.google.com/d/optout.
