Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Jeff Steinmetz Thu, 08 Jan 2015 23:20:03 -0800

Now that I am into the real wold scenario, it gets a bit tricker - I have 
nested objects (keys).
I have to test the existence of the key in the Groovy script to avoid 
parsing errors on insert.


How do you access a nested object in groovy?  and test for the existence of 
a nested object key?
such as this example:

curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{
  "titles": ["title 1", "title 2", "title 3", "title 4"],
  "raw" : {
    "links" : ["http://bit.ly/abc";, "http://bit.ly/abc";, 
"http://bit.ly/def";, "http://bit.ly/ghi";]
  }
}'

This doesn't seem to work (form what I can tell it never finds the key 
raw.links even when it does exist)

      "script" : "if (ctx._source.containsKey('raw.links') ) 
{ctx._source.links_url_count = ctx._source['raw.links''].size() } else { 
ctx._source.links_url_count = 0 }"

Simple keys work though like ctx._source.containsKey('title') 

On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote:
>
> Transform never saves to source. You have to transform on the application 
> side for that. It was designed for times when you wanted to index something 
> like this that would just take up extra space in the source document. I 
> imagine you could use a script field on the query if you need the result to 
> contain the count. Or just count it on the result side. 
>
> Nik
> On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" <[email protected] 
> <javascript:>> wrote:
>
>> Transform worked well.  Nice.
>>
>> Curious how to get it to save to source?  Tried this below, no go.  (I 
>> can however do range queries agains title_count, so transform was indexed 
>> and works well)
>>
>>     "transform" : {
>>       "script" : "ctx._source['\'title_count\''] = 
>> ctx._source['\'titles\''].size()",
>>       "lang": "groovy"
>>     },
>>      "properties": {
>>      "titles": { "type": "string", "index": "not_analyzed" },
>>      "title_count" : { "type": "integer", "store": "yes" }
>>    }
>> }'
>>
>>
>> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>>
>>> Source is going to be pretty sloe, yeah. If its a one off then its 
>>> probably fine but if you do it a lot probably best to index the count. 
>>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <[email protected]> wrote:
>>>
>>>> Thank you, that worked.
>>>>
>>>> I was curious about the speed, is running a script using _source slower 
>>>> that doc[] ?
>>>>
>>>> Totally understand a dynamic script is slower regardless of _source vs 
>>>> doc[].
>>>>
>>>> Makes sense that having a count transformed up front during index to 
>>>> create a materialized value would certainly be much faster.
>>>>
>>>>
>>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <[email protected]> 
>>>>> wrote:
>>>>>
>>>>> Is there a better way to do this?
>>>>>>
>>>>>> Please see this gist (or even better yet, run the script locally see 
>>>>>> the issue).
>>>>>>
>>>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>>>
>>>>>> You must have scripting enabled in your elasticsearch config for this 
>>>>>> to work.
>>>>>>
>>>>>> This was originally based on some comments I found here:
>>>>>> http://stackoverflow.com/questions/17314123/search-by-size-
>>>>>> of-object-type-field-elastic-search
>>>>>>
>>>>>> We would like to use a filtered query to only include documents that 
>>>>>> a small count of items in the list [aka array], filtering where 
>>>>>>  values.size() < 10
>>>>>>
>>>>>> "script": "doc['titles'].values.size() < 10"
>>>>>>
>>>>>> Turns out the values.size() actually either counts tokenized 
>>>>>> (analyzed) words, or if the mapping turns off analysis, it still counts 
>>>>>> incorrectly if there are duplicates.
>>>>>> If analyze is not turned off, it counts tokenized words, not the 
>>>>>> number of elements in the list.
>>>>>> If analyze is turned off for a given field, it improves, but 
>>>>>> duplicates are missed.
>>>>>>
>>>>>> For example, This comes back as size == 2
>>>>>> "titles": ["one", "duplicate", "duplicate"]
>>>>>> This comes back as size == 3, should be 4
>>>>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "
>>>>>> http://bit.ly/def";, "http://bit.ly/ghi";]
>>>>>>
>>>>>> Is this a bug, is there a better way, or is this just something that 
>>>>>> we don't understand about groovy and values.size()?
>>>>>>
>>>>>>
>>>>>>
>>>>> I think that's just the way doc[] works.  Try (but don't actually 
>>>>> deploy) _source['titles'].size() < 10.  That should do what you expect.  
>>>>> Don't deploy that because its too slow.  Try indexing the size and 
>>>>> filtering on it.  You can use a transform to add the size of the array as 
>>>>> an integer field and just filter on it using a range filter.  That'd 
>>>>> probably be the fastest option.
>>>>>
>>>>> Nik
>>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3717aecd-78c1-4e48-9771-acc49f8c730a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Reply via email to