Re: What is the difference between query_string and multi-match for querying docs ?

Mukul Gupta Tue, 04 Feb 2014 03:32:57 -0800

Hi Ivan,

I followed your advice and started using explain api for query_string but I
guess in process I found a bug (don't know if it really is a bug or
intended behaviour of query_string). This is going to be a long post,
please be patient with me.


I'm using a doc:{name:"new delhi to goa",st:"goa"}
On using analyzer api for indexing I got these tokens:

{
  "tokens" : [ {
    "token" : "new",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new ",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new d",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new de",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new del",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delh",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new ",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new d",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new de",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new del",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delh",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi ",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi t",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new ",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new d",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new de",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new del",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delh",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi ",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi t",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to ",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to g",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to go",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "new delhi to goa",
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "del",
    "start_offset" : 4,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delh",
    "start_offset" : 4,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi",
    "start_offset" : 4,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "del",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delh",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi ",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi t",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to",
    "start_offset" : 4,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "del",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delh",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi ",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi t",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to ",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to g",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to go",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "delhi to goa",
    "start_offset" : 4,
    "end_offset" : 16,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "to ",
    "start_offset" : 10,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "to g",
    "start_offset" : 10,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "to go",
    "start_offset" : 10,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "to goa",
    "start_offset" : 10,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "goa",
    "start_offset" : 13,
    "end_offset" : 16,
    "type" : "word",
    "position" : 4
  } ]
}

Now, if I query like: "delhi to goa", I got this by search_analyzer:

{
  "tokens" : [ {
    "token" : "del",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delh",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "del",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delh",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi ",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi t",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "del",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delh",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi ",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi t",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to ",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to g",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to go",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "delhi to goa",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "to ",
    "start_offset" : 6,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "to g",
    "start_offset" : 6,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "to go",
    "start_offset" : 6,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "to goa",
    "start_offset" : 6,
    "end_offset" : 12,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "goa",
    "start_offset" : 9,
    "end_offset" : 12,
    "type" : "word",
    "position" : 3
  } ]
}

On using explain api, it gives me following:

{text=new delhi to goa,boostFactor=9.820192307,po=9.82}
510.39673 = custom score, product of:
  510.39673 = script score function: composed of:
    510.39673 = sum of:
      371.12375 = max of:
        371.12375 = sum of:
          104.61707 = weight(text:del in 1003990)
[PerFieldSimilarity], result of:
            104.61707 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
              0.43576795 = queryWeight, product of:
                5.368244 = idf(docFreq=53067, maxDocs=4187328)
                0.08117513 = queryNorm
              240.0752 = fieldWeight in 1003990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.368244 = idf(docFreq=53067, maxDocs=4187328)
                20.0 = fieldNorm(doc=1003990)
          133.24011 = weight(text:delh in 1003990)
[PerFieldSimilarity], result of:
            133.24011 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
              0.49178073 = queryWeight, product of:
                6.058268 = idf(docFreq=26616, maxDocs=4187328)
                0.08117513 = queryNorm
              270.934 = fieldWeight in 1003990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.058268 = idf(docFreq=26616, maxDocs=4187328)
                20.0 = fieldNorm(doc=1003990)
          133.26657 = weight(text:delhi in 1003990)
[PerFieldSimilarity], result of:
            133.26657 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
              0.49182954 = queryWeight, product of:
                6.0588694 = idf(docFreq=26600, maxDocs=4187328)
                0.08117513 = queryNorm
              270.96088 = fieldWeight in 1003990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.0588694 = idf(docFreq=26600, maxDocs=4187328)
                20.0 = fieldNorm(doc=1003990)
      139.27298 = max of:
        139.27298 = weight(text:goa^20.0 in 1003990)
[PerFieldSimilarity], result of:
          139.27298 = score(doc=1003990,freq=3.0 = termFreq=3.0
), product of:
            0.5712808 = queryWeight, product of:
              20.0 = boost
              7.037633 = idf(docFreq=9995, maxDocs=4187328)
              0.004058757 = queryNorm
            243.79076 = fieldWeight in 1003990, product of:
              1.7320508 = tf(freq=3.0), with freq of:
                3.0 = termFreq=3.0
              7.037633 = idf(docFreq=9995, maxDocs=4187328)
              20.0 = fieldNorm(doc=1003990)
  1.0 = queryBoost

Though the above explain shows the results for:
del
delh
delhi
goa

But not getting results for other tokens which were generated by my search
analyzer. Why is it so ?

I have read that query_string uses query parser which is based on Lucene by
default. So, My guess is query_string is using a whitespace tokenizer after
my tokens are generated by search analyzer, am I correct ? How can I make
query_string to calculate score for all the tokens which are generated by
search_analyzer.  Please correct me if I am wrong.

There is one more things which I noticed,
I'm using a query time boost on one of my doc field but it is not working
the way I thought it would work. In the above explain you can see, there is
a boost associated with goa but not with delhi, though but goa and delhi
are present in original doc. My guess for this is,
query_string applies boost to only terms where a term is a token of a user
typed string which is not analyzed by any analyzer because in the above
example, goa is kept as it is but delhi is being analyzed. Am I correct ?

Waiting a reply !!!

Thanks



On Tue, Feb 4, 2014 at 1:03 AM, Ivan Brusic <[email protected]> wrote:

> Try using the plugin I suggested and/or the explain API.
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-explain.html
>
>
>
> On Mon, Feb 3, 2014 at 11:26 AM, Mukul Gupta <[email protected]>wrote:
>
>> Hi Ivan,
>>
>> Thanks for the reply. Can you please tell how query_string works
>> internally ? In the documentation, it says query is parsed internally. I
>> want to know what is the meaning of this ? Let's say I 'm searching for
>> "delhi t" so how query_string will query for this string. Also assume I'm
>> using search_analyzer = standard with filter
>> lowercase,asciifolding,suggestion_shingle
>> (min_gram=2,max_gram=2),edgengram(min_gram=2,max_gram=15). I know after
>> search_analyzer I'll have the following:
>>
>> {
>>   "tokens" : [ {
>>     "token" : "de",
>>     "start_offset" : 0,
>>     "end_offset" : 5,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "del",
>>     "start_offset" : 0,
>>     "end_offset" : 5,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delh",
>>     "start_offset" : 0,
>>     "end_offset" : 5,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delhi",
>>     "start_offset" : 0,
>>     "end_offset" : 5,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "de",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "del",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delh",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delhi",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delhi ",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "delhi t",
>>     "start_offset" : 0,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 1
>>   } ]
>> }
>>
>> So, now how a query string will search in a document ?
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Tue, Feb 4, 2014 at 12:39 AM, Ivan Brusic <[email protected]> wrote:
>>
>>> The key difference is that a query_string query (with multiple fields
>>> and the AND operator) will match when each term exists in at least one
>>> field while a multi-match query (also using AND) matches only when all the
>>> terms exist in at least one field. The terms of a query_string query do not
>>> need to exist in the same field, while it does matter in a multi-match.
>>>
>>> Play around with the Sense Chrome plugin and some sample data:
>>> https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo
>>>
>>> Cheers,
>>>
>>> Ivan
>>>
>>>
>>> On Mon, Feb 3, 2014 at 10:55 AM, coder <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can anyone please tell me a detailed difference between these two
>>>> queries. I studied the documentation but not able to figure out the
>>>> difference between two. Can anyone please explain it with some examples in
>>>> a more detailed fashion. I expect query string to give me docs which
>>>> matches maximum number of terms which are generated by search_analyzer to
>>>> indexed docs but it is not happening that way.
>>>>
>>>> Please help !!!
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/fbc68c13-bcb9-40cb-9726-becbef14f278%40googlegroups.com
>>>> .
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBP-3JhfnX0Qoj0uYnBBUX_V9Oysc45tg40eK7b7NB0rQ%40mail.gmail.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAAVTvp6XQ%3DJWCN9KkSS4cjs1ua1rtcy3Fcr8xG-AFN4dBC5HNw%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBRM_ZF4bvQCdOvQnyxgz%3DNrQAbAG02e8CGTHQoPrDrQw%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAVTvp4SDcKVOm6nh_-NQg7O5bVeQGLmQ3HD7UvAdu%3DB9m_Nxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: What is the difference between query_string and multi-match for querying docs ?

Reply via email to