You can also look at developing a custom analyzer so that your phrase is 
not broken up at white space when indexed. 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html

Selecting the correct combination of char filters and tokenizers will 
retain phrases.

For example, using the whitespace analyzer will separate on whitespace:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 
'foo bar baz'
{
  "tokens" : [ {
    "token" : "foo",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bar",
    "start_offset" : 4,
    "end_offset" : 7,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "baz",
    "start_offset" : 8,
    "end_offset" : 11,
    "type" : "word",
    "position" : 3
  } ]
}


However, using the keyword analyzer will retain the entire phrase:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=keyword' -d 'foo 
bAr baZ'
{
  "tokens" : [ {
    "token" : "foo bAr baZ",
    "start_offset" : 0,
    "end_offset" : 11,
    "type" : "word",
    "position" : 1
  } ]
}

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:
>
> Hello Valergi ,
>
> This wont work , normally becuase the string would be tokenized into green 
> and energy.
> If you use shingle token filter and set it as 2   , it might work.
> Or in this case , you can see the position value of both the token using 
> the script and if its next to each other , you can take it as an 
> occurrence. 
>
> Thanks
>           Vineeth
>
> On Tue, Oct 28, 2014 at 3:06 PM, <[email protected] 
> <javascript:>> wrote:
>
>> I want to access frequency of a phraze combined from multiple words e.g. 
>> "green energy"
>>
>> I can access tf of "green" and "energy", example:
>>
>> "function_score":
>> {
>>     "filter" : {
>>         "terms" : { "content" : ["energy","green"]}
>>     },
>>     "script_score": {
>>         "script": "_index['content']['energy'].tf() + 
>> _index['content']['green'].tf()",
>>         "lang":"groovy"
>>     }
>> }
>>
>> This works fine. However, how can I find the frequency of a term "green 
>> energy" as
>>
>> _index['content']['green energy'].tf() does not work
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/87fbc699-ade2-489f-b715-a987066d6cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to