Transform worked well.  Nice.

Curious how to get it to save to source?  Tried this below, no go.  (I can 
however do range queries agains title_count, so transform was indexed and 
works well)

    "transform" : {
      "script" : "ctx._source['\'title_count\''] = 
ctx._source['\'titles\''].size()",
      "lang": "groovy"
    },
     "properties": {
     "titles": { "type": "string", "index": "not_analyzed" },
     "title_count" : { "type": "integer", "store": "yes" }
   }
}'


On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>
> Source is going to be pretty sloe, yeah. If its a one off then its 
> probably fine but if you do it a lot probably best to index the count. 
> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <[email protected] 
> <javascript:>> wrote:
>
>> Thank you, that worked.
>>
>> I was curious about the speed, is running a script using _source slower 
>> that doc[] ?
>>
>> Totally understand a dynamic script is slower regardless of _source vs 
>> doc[].
>>
>> Makes sense that having a count transformed up front during index to 
>> create a materialized value would certainly be much faster.
>>
>>
>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>
>>>
>>>
>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <[email protected]> 
>>> wrote:
>>>
>>> Is there a better way to do this?
>>>>
>>>> Please see this gist (or even better yet, run the script locally see 
>>>> the issue).
>>>>
>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>
>>>> You must have scripting enabled in your elasticsearch config for this 
>>>> to work.
>>>>
>>>> This was originally based on some comments I found here:
>>>> http://stackoverflow.com/questions/17314123/search-by-
>>>> size-of-object-type-field-elastic-search
>>>>
>>>> We would like to use a filtered query to only include documents that a 
>>>> small count of items in the list [aka array], filtering where 
>>>>  values.size() < 10
>>>>
>>>> "script": "doc['titles'].values.size() < 10"
>>>>
>>>> Turns out the values.size() actually either counts tokenized (analyzed) 
>>>> words, or if the mapping turns off analysis, it still counts incorrectly 
>>>> if 
>>>> there are duplicates.
>>>> If analyze is not turned off, it counts tokenized words, not the number 
>>>> of elements in the list.
>>>> If analyze is turned off for a given field, it improves, but duplicates 
>>>> are missed.
>>>>
>>>> For example, This comes back as size == 2
>>>> "titles": ["one", "duplicate", "duplicate"]
>>>> This comes back as size == 3, should be 4
>>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";, 
>>>> "http://bit.ly/ghi";]
>>>>
>>>> Is this a bug, is there a better way, or is this just something that we 
>>>> don't understand about groovy and values.size()?
>>>>
>>>>
>>>>
>>> I think that's just the way doc[] works.  Try (but don't actually 
>>> deploy) _source['titles'].size() < 10.  That should do what you expect.  
>>> Don't deploy that because its too slow.  Try indexing the size and 
>>> filtering on it.  You can use a transform to add the size of the array as 
>>> an integer field and just filter on it using a range filter.  That'd 
>>> probably be the fastest option.
>>>
>>> Nik
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to