Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Nikolas Everett Thu, 08 Jan 2015 21:16:11 -0800

Source is going to be pretty sloe, yeah. If its a one off then its probably
fine but if you do it a lot probably best to index the count.
On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <[email protected]>
wrote:


> Thank you, that worked.
>
> I was curious about the speed, is running a script using _source slower
> that doc[] ?
>
> Totally understand a dynamic script is slower regardless of _source vs
> doc[].
>
> Makes sense that having a count transformed up front during index to
> create a materialized value would certainly be much faster.
>
>
> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>
>>
>>
>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <[email protected]>
>> wrote:
>>
>> Is there a better way to do this?
>>>
>>> Please see this gist (or even better yet, run the script locally see the
>>> issue).
>>>
>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>
>>> You must have scripting enabled in your elasticsearch config for this to
>>> work.
>>>
>>> This was originally based on some comments I found here:
>>> http://stackoverflow.com/questions/17314123/search-by-
>>> size-of-object-type-field-elastic-search
>>>
>>> We would like to use a filtered query to only include documents that a
>>> small count of items in the list [aka array], filtering where
>>>  values.size() < 10
>>>
>>> "script": "doc['titles'].values.size() < 10"
>>>
>>> Turns out the values.size() actually either counts tokenized (analyzed)
>>> words, or if the mapping turns off analysis, it still counts incorrectly if
>>> there are duplicates.
>>> If analyze is not turned off, it counts tokenized words, not the number
>>> of elements in the list.
>>> If analyze is turned off for a given field, it improves, but duplicates
>>> are missed.
>>>
>>> For example, This comes back as size == 2
>>> "titles": ["one", "duplicate", "duplicate"]
>>> This comes back as size == 3, should be 4
>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";,
>>> "http://bit.ly/ghi";]
>>>
>>> Is this a bug, is there a better way, or is this just something that we
>>> don't understand about groovy and values.size()?
>>>
>>>
>>>
>> I think that's just the way doc[] works.  Try (but don't actually deploy)
>> _source['titles'].size() < 10.  That should do what you expect.  Don't
>> deploy that because its too slow.  Try indexing the size and filtering on
>> it.  You can use a transform to add the size of the array as an integer
>> field and just filter on it using a range filter.  That'd probably be the
>> fastest option.
>>
>> Nik
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd35LG%3Dki2jMigsfgwrojXVBTCkJH784wu7GbEcXvu3tRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Reply via email to