Adrien,
Regarding the boosting issue:
I have a field "text" and I'm using a query-time boost like
field=["text^30"]
Assume I have a doc like {text:"new delhi to goa"}. Now if I query for
"delhi to goa" then score for only term goa is boosted like goa^30 (as you
can see above in explain output) but what I expect is it should boost delhi
also like "delhi^30" which is not happening here. Is it like goa is not
analyzed so it will be considered as a term but delhi since it is analyzed
by analyzer it won't be considered as a term.
Thanks
On Wed, Feb 5, 2014 at 9:26 PM, Adrien Grand <[email protected]
> wrote:
> Hi,
>
> Indeed, query_string splits on whitespaces before applying the analyzer.
> You could try the match query[1] which doesn't have this flaw or the new
> simple_query_parser[2] which has the ability to disable the whitespace
> operator (just provide a list of flags that doesn't contain WHITESPACE).
>
> However I didn't understand your boosting issue, what query did you send
> to Elasticsearch?
>
> [1]
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> [2]
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#_simple_query_string_syntax
>
>
> On Wed, Feb 5, 2014 at 4:47 AM, coder <[email protected]> wrote:
>
>> I started using explain api for query_string but I guess in process I
>> found a bug (don't know if it really is a bug or intended behaviour of
>> query_string). This is going to be a long post, please be patient with me.
>>
>> I'm using a doc:{name:"new delhi to goa",st:"goa"}
>> On using analyzer api for indexing I got these tokens:
>>
>> {
>> "tokens" : [ {
>> "token" : "new",
>> "start_offset" : 0,
>> "end_offset" : 3,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new ",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new d",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new de",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new del",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delh",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi",
>> "start_offset" : 0,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new ",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new d",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new de",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new del",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delh",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi ",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi t",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new ",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new d",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new de",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new del",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delh",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi ",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi t",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to ",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to g",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to go",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "new delhi to goa",
>> "start_offset" : 0,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "del",
>> "start_offset" : 4,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delh",
>> "start_offset" : 4,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi",
>> "start_offset" : 4,
>> "end_offset" : 9,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "del",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delh",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi ",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi t",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to",
>> "start_offset" : 4,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "del",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delh",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi ",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi t",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to ",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to g",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to go",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "delhi to goa",
>> "start_offset" : 4,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "to ",
>> "start_offset" : 10,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 3
>> }, {
>> "token" : "to g",
>> "start_offset" : 10,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 3
>> }, {
>> "token" : "to go",
>> "start_offset" : 10,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 3
>> }, {
>> "token" : "to goa",
>> "start_offset" : 10,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 3
>> }, {
>> "token" : "goa",
>> "start_offset" : 13,
>> "end_offset" : 16,
>> "type" : "word",
>> "position" : 4
>> } ]
>> }
>>
>> Now, if I query like: "delhi to goa", I got this by search_analyzer:
>>
>> {
>> "tokens" : [ {
>> "token" : "del",
>> "start_offset" : 0,
>> "end_offset" : 5,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delh",
>> "start_offset" : 0,
>> "end_offset" : 5,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi",
>> "start_offset" : 0,
>> "end_offset" : 5,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "del",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delh",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi ",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi t",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to",
>> "start_offset" : 0,
>> "end_offset" : 8,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "del",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delh",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi ",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi t",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to ",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to g",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to go",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "delhi to goa",
>> "start_offset" : 0,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 1
>> }, {
>> "token" : "to ",
>> "start_offset" : 6,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "to g",
>> "start_offset" : 6,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "to go",
>> "start_offset" : 6,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "to goa",
>> "start_offset" : 6,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 2
>> }, {
>> "token" : "goa",
>> "start_offset" : 9,
>> "end_offset" : 12,
>> "type" : "word",
>> "position" : 3
>> } ]
>> }
>>
>> On using explain api, it gives me following:
>>
>> {text=new delhi to goa,boostFactor=9.820192307,po=9.82}
>> 510.39673 = custom score, product of:
>> 510.39673 = script score function: composed of:
>> 510.39673 = sum of:
>>
>>
>> 371.12375 = max of:
>> 371.12375 = sum of:
>> 104.61707 = weight(text:del in 1003990) [PerFieldSimilarity],
>> result of:
>> 104.61707 = score(doc=1003990,freq=5.0 = termFreq=5.0
>> ), product of:
>>
>>
>> 0.43576795 = queryWeight, product of:
>> 5.368244 = idf(docFreq=53067, maxDocs=4187328)
>> 0.08117513 = queryNorm
>> 240.0752 = fieldWeight in 1003990, product of:
>>
>>
>> 2.236068 = tf(freq=5.0), with freq of:
>> 5.0 = termFreq=5.0
>> 5.368244 = idf(docFreq=53067, maxDocs=4187328)
>> 20.0 = fieldNorm(doc=1003990)
>> 133.24011 = weight(text:delh in 1003990) [PerFieldSimilarity],
>> result of:
>>
>>
>> 133.24011 = score(doc=1003990,freq=5.0 = termFreq=5.0
>> ), product of:
>> 0.49178073 = queryWeight, product of:
>> 6.058268 = idf(docFreq=26616, maxDocs=4187328)
>> 0.08117513 = queryNorm
>>
>>
>> 270.934 = fieldWeight in 1003990, product of:
>> 2.236068 = tf(freq=5.0), with freq of:
>> 5.0 = termFreq=5.0
>> 6.058268 = idf(docFreq=26616, maxDocs=4187328)
>>
>>
>> 20.0 = fieldNorm(doc=1003990)
>> 133.26657 = weight(text:delhi in 1003990) [PerFieldSimilarity],
>> result of:
>> 133.26657 = score(doc=1003990,freq=5.0 = termFreq=5.0
>> ), product of:
>>
>>
>> 0.49182954 = queryWeight, product of:
>> 6.0588694 = idf(docFreq=26600, maxDocs=4187328)
>> 0.08117513 = queryNorm
>> 270.96088 = fieldWeight in 1003990, product of:
>>
>>
>> 2.236068 = tf(freq=5.0), with freq of:
>> 5.0 = termFreq=5.0
>> 6.0588694 = idf(docFreq=26600, maxDocs=4187328)
>> 20.0 = fieldNorm(doc=1003990)
>> 139.27298 = max of:
>>
>>
>> 139.27298 = weight(text:goa^20.0 in 1003990) [PerFieldSimilarity],
>> result of:
>> 139.27298 = score(doc=1003990,freq=3.0 = termFreq=3.0
>> ), product of:
>> 0.5712808 = queryWeight, product of:
>>
>>
>> 20.0 = boost
>> 7.037633 = idf(docFreq=9995, maxDocs=4187328)
>> 0.004058757 = queryNorm
>> 243.79076 = fieldWeight in 1003990, product of:
>> 1.7320508 = tf(freq=3.0), with freq of:
>>
>>
>> 3.0 = termFreq=3.0
>> 7.037633 = idf(docFreq=9995, maxDocs=4187328)
>> 20.0 = fieldNorm(doc=1003990)
>> 1.0 = queryBoost
>>
>> Though the above explain shows the results for:
>> del
>> delh
>> delhi
>> goa
>>
>> But not getting results for other tokens which were generated by my
>> search analyzer. Why is it so ?
>>
>> I have read that query_string uses query parser which is based on Lucene
>> by default. So, My guess is query_string is using a whitespace tokenizer
>> after my tokens are generated by search analyzer, am I correct ? How can I
>> make query_string to calculate score for all the tokens which are generated
>> by search_analyzer. Please correct me if I am wrong.
>>
>> There is one more things which I noticed,
>> I'm using a query time boost on one of my doc field but it is not working
>> the way I thought it would work. In the above explain you can see, there is
>> a boost associated with goa but not with delhi, though but goa and delhi
>> are present in original doc. My guess for this is,
>> query_string applies boost to only terms where a term is a token of a
>> user typed string which is not analyzed by any analyzer because in the
>> above example, goa is kept as it is but delhi is being analyzed. Am I
>> correct ?
>>
>> Waiting a reply !!!
>>
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/10dd24df-fe87-430d-8433-73df1acb1d0c%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7_LGPtbnM7yuNQoAjOR31kOKmddpnsJpuoEN2fssS1zw%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAVTvp4_PfRZPKqMTRZ34o4fKtdv7ROs%3DLsVsT3%2B3rLHHucfPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.