Source is going to be pretty sloe, yeah. If its a one off then its probably fine but if you do it a lot probably best to index the count. On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <[email protected]> wrote:
> Thank you, that worked. > > I was curious about the speed, is running a script using _source slower > that doc[] ? > > Totally understand a dynamic script is slower regardless of _source vs > doc[]. > > Makes sense that having a count transformed up front during index to > create a materialized value would certainly be much faster. > > > On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote: >> >> >> >> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <[email protected]> >> wrote: >> >> Is there a better way to do this? >>> >>> Please see this gist (or even better yet, run the script locally see the >>> issue). >>> >>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae >>> >>> You must have scripting enabled in your elasticsearch config for this to >>> work. >>> >>> This was originally based on some comments I found here: >>> http://stackoverflow.com/questions/17314123/search-by- >>> size-of-object-type-field-elastic-search >>> >>> We would like to use a filtered query to only include documents that a >>> small count of items in the list [aka array], filtering where >>> values.size() < 10 >>> >>> "script": "doc['titles'].values.size() < 10" >>> >>> Turns out the values.size() actually either counts tokenized (analyzed) >>> words, or if the mapping turns off analysis, it still counts incorrectly if >>> there are duplicates. >>> If analyze is not turned off, it counts tokenized words, not the number >>> of elements in the list. >>> If analyze is turned off for a given field, it improves, but duplicates >>> are missed. >>> >>> For example, This comes back as size == 2 >>> "titles": ["one", "duplicate", "duplicate"] >>> This comes back as size == 3, should be 4 >>> "titles": ["http://bit.ly/abc", "http://bit.ly/abc", "http://bit.ly/def", >>> "http://bit.ly/ghi"] >>> >>> Is this a bug, is there a better way, or is this just something that we >>> don't understand about groovy and values.size()? >>> >>> >>> >> I think that's just the way doc[] works. Try (but don't actually deploy) >> _source['titles'].size() < 10. That should do what you expect. Don't >> deploy that because its too slow. Try indexing the size and filtering on >> it. You can use a transform to add the size of the array as an integer >> field and just filter on it using a range filter. That'd probably be the >> fastest option. >> >> Nik >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd35LG%3Dki2jMigsfgwrojXVBTCkJH784wu7GbEcXvu3tRg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
