Now that I am into the real wold scenario, it gets a bit tricker - I have
nested objects (keys).
I have to test the existence of the key in the Groovy script to avoid
parsing errors on insert.
How do you access a nested object in groovy? and test for the existence of
a nested object key?
such as this example:
curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{
"titles": ["title 1", "title 2", "title 3", "title 4"],
"raw" : {
"links" : ["http://bit.ly/abc", "http://bit.ly/abc",
"http://bit.ly/def", "http://bit.ly/ghi"]
}
}'
This doesn't seem to work (form what I can tell it never finds the key
raw.links even when it does exist)
"script" : "if (ctx._source.containsKey('raw.links') )
{ctx._source.links_url_count = ctx._source['raw.links''].size() } else {
ctx._source.links_url_count = 0 }"
Simple keys work though like ctx._source.containsKey('title')
On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote:
>
> Transform never saves to source. You have to transform on the application
> side for that. It was designed for times when you wanted to index something
> like this that would just take up extra space in the source document. I
> imagine you could use a script field on the query if you need the result to
> contain the count. Or just count it on the result side.
>
> Nik
> On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" <[email protected]
> <javascript:>> wrote:
>
>> Transform worked well. Nice.
>>
>> Curious how to get it to save to source? Tried this below, no go. (I
>> can however do range queries agains title_count, so transform was indexed
>> and works well)
>>
>> "transform" : {
>> "script" : "ctx._source['\'title_count\''] =
>> ctx._source['\'titles\''].size()",
>> "lang": "groovy"
>> },
>> "properties": {
>> "titles": { "type": "string", "index": "not_analyzed" },
>> "title_count" : { "type": "integer", "store": "yes" }
>> }
>> }'
>>
>>
>> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>>
>>> Source is going to be pretty sloe, yeah. If its a one off then its
>>> probably fine but if you do it a lot probably best to index the count.
>>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <[email protected]> wrote:
>>>
>>>> Thank you, that worked.
>>>>
>>>> I was curious about the speed, is running a script using _source slower
>>>> that doc[] ?
>>>>
>>>> Totally understand a dynamic script is slower regardless of _source vs
>>>> doc[].
>>>>
>>>> Makes sense that having a count transformed up front during index to
>>>> create a materialized value would certainly be much faster.
>>>>
>>>>
>>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Is there a better way to do this?
>>>>>>
>>>>>> Please see this gist (or even better yet, run the script locally see
>>>>>> the issue).
>>>>>>
>>>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>>>
>>>>>> You must have scripting enabled in your elasticsearch config for this
>>>>>> to work.
>>>>>>
>>>>>> This was originally based on some comments I found here:
>>>>>> http://stackoverflow.com/questions/17314123/search-by-size-
>>>>>> of-object-type-field-elastic-search
>>>>>>
>>>>>> We would like to use a filtered query to only include documents that
>>>>>> a small count of items in the list [aka array], filtering where
>>>>>> values.size() < 10
>>>>>>
>>>>>> "script": "doc['titles'].values.size() < 10"
>>>>>>
>>>>>> Turns out the values.size() actually either counts tokenized
>>>>>> (analyzed) words, or if the mapping turns off analysis, it still counts
>>>>>> incorrectly if there are duplicates.
>>>>>> If analyze is not turned off, it counts tokenized words, not the
>>>>>> number of elements in the list.
>>>>>> If analyze is turned off for a given field, it improves, but
>>>>>> duplicates are missed.
>>>>>>
>>>>>> For example, This comes back as size == 2
>>>>>> "titles": ["one", "duplicate", "duplicate"]
>>>>>> This comes back as size == 3, should be 4
>>>>>> "titles": ["http://bit.ly/abc", "http://bit.ly/abc", "
>>>>>> http://bit.ly/def", "http://bit.ly/ghi"]
>>>>>>
>>>>>> Is this a bug, is there a better way, or is this just something that
>>>>>> we don't understand about groovy and values.size()?
>>>>>>
>>>>>>
>>>>>>
>>>>> I think that's just the way doc[] works. Try (but don't actually
>>>>> deploy) _source['titles'].size() < 10. That should do what you expect.
>>>>> Don't deploy that because its too slow. Try indexing the size and
>>>>> filtering on it. You can use a transform to add the size of the array as
>>>>> an integer field and just filter on it using a range filter. That'd
>>>>> probably be the fastest option.
>>>>>
>>>>> Nik
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3717aecd-78c1-4e48-9771-acc49f8c730a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.