Hi Doug,

Thank you for your quick response and comprehensive explanation. It does 
make sense.

We are using cross_fields (with the "and" operator) because we want to make 
sure that the documents returned contain *all* the search terms somewhere. 

For example, the search for "100 john smith" would return only one 
document. ("john smith" matches the name and "100" matches the address")

We expect no results for "200 john smith" as 200 appears nowhere.

But if we search for "john smith" we should get both documents back and the 
document with "john smith" should be the first one is the list (since terms 
"john" "smith" matches on the same field).

Is there possible to accomplish this with best_fields or most_fields?

Thanks again,

Andre

On Tuesday, April 14, 2015 at 12:21:33 PM UTC+10, Doug Turnbull wrote:
>
> Sorry for the confusing typo -- "towards matches with fewer *fields".* fields 
> should be search *terms*
>
> On Mon, Apr 13, 2015 at 9:30 PM, Doug Turnbull <
> [email protected] <javascript:>> wrote:
>
>> First, note that in Lucene's default similarity there already are two 
>> biases towards matches with fewer fields. Try to take advantage of those 
>> before going on a boosting expedition
>>
>> 1. Each term tends to get converted into a boolean SHOULD clause. Every 
>> SHOULD clause match gets added to the score. So the fewer matches, the 
>> lower the score.
>>
>> 2. For an even stronger bias, Lucene adds **coord** or the coordinating 
>> factor. If only 1 out of 3 search terms match the field being searched, a 
>> multiple of 1/3 is applied thus punishing the score. So matches where more 
>> terms match should have a much higher chance of winning.
>>
>> If you want to know more, read Lucene's javadocs on similarity: 
>> https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>>
>> Huh you're thinking, why doesn't my scenario just work then. What you're 
>> doing is *cross_field* search. Cross field search is something new to 
>> Elasticsearch whereby both fields are blended together and treated like a 
>> single field. So the biasing above applies to the two fields together. If 
>> you want to know more about cross-field search -- here's an article I 
>> recently wrote 
>> http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
>>
>> If you want to actually have a bias towards a field with more matches, 
>> I'd recommend best_field or most_fields search. They will take both search 
>> terms to each field first, performing different searches in each field. 
>> Then they will be combined (either by adding or taking the max score).
>>
>> Untill I finish the related chapter in the search relevance book I'm 
>> writing <shameless plug :-p http://manning.com/turnbull> the best place 
>> to read about these topics are the docs or the online guide. In particular, 
>> this appears relevant
>>
>> http://www.elastic.co/guide/en/elasticsearch/guide/master/multi-field-search.html
>>
>> Hope that helps
>>
>>
>> On Mon, Apr 13, 2015 at 7:30 PM, Andre Dantas Rocha <
>> [email protected] <javascript:>> wrote:
>>
>>> Hi there,
>>>
>>> I have the following query:
>>>
>>> "query": {
>>>   "multi_match": {
>>>     "operator": "and",
>>>     "type": "cross_fields",
>>>     "query": "john smith",
>>>     "fields": ["name", "address"]
>>>   }
>>> }
>>>
>>> That will match these documents:
>>>
>>> Name: James *Smith*
>>> Address: 325 *John* Street
>>>
>>> Name: *John Smith* Junior
>>> Address: 100 Baryl Street
>>>
>>> Is there a way to give the last document a higher score since the terms 
>>> "john" "smith" have two matches on the same field?
>>>
>>> Notice that behavior is a little bit different from the one using 
>>> match_phrase with slop because the query can still match terms in any of 
>>> the fields but score higher when there are more matches on the same field.
>>>
>>> Thanks,
>>>
>>> Andre
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/cc76f51b-3721-4978-a3ed-e59ff4c8f138%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/cc76f51b-3721-4978-a3ed-e59ff4c8f138%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>> *Doug Turnbull **| *Search Relevance Consultant | OpenSource 
>> Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com 
>> Author: Taming Search <http://manning.com/turnbull> from Manning 
>> Publications 
>> This e-mail and all contents, including attachments, is considered to be 
>> Company Confidential unless explicitly stated otherwise, regardless 
>> of whether attachments are marked as such.
>>
>>  
>
>
> -- 
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, 
> LLC | 240.476.9983 | http://www.opensourceconnections.com 
> Author: Taming Search <http://manning.com/turnbull> from Manning 
> Publications 
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless 
> of whether attachments are marked as such.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49b56ab9-bdd0-4d7d-b63f-963d05b70744%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to