there is not that much you can really do here
parent/child queries tend to be very slow & eat a lot of heap space 

i had similar performance problem 
in my case i had 3 level relationship (parent/child/grandchild) and query 
time was in average x10 slower for every level

so my suggestion will be to switch to using nested documents + update api
if your query time is more important than update time, that will be the way 
to go 
(in my case query performance improvement was x100 times)


http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

Regards,
Karol Gwaj

On Sunday, March 30, 2014 8:28:33 AM UTC+1, Lauri wrote:
>
> Hi,
>
> I'm having performance problems with has parent filter.
>
> The for the child document is:
> {
>   "program": {
>     "_parent": { "type": "series" },
>     ...
>   }
> }
>
> And for the parent document:
> {
>   "series": {
>     ...
>     "properties": {
>       ...
>       "subject":{
>         "type": "object",
>         "properties": {
>           ...
>           "_path": {
>             "type": "object",
>             "properties": {
>               "id": { "type": "string", "analyzer": "path_analyzer" }
>               ...
>             }
>           }
>         }
>       },
>       ...
>     }
>   }
> }
>
> If I search documents of type program (the child) like this:
> {
>   "from": 0,
>   "size": 25,
>   "query": {
>     "filtered": {
>       "query": { "match_all": {} },
>       "filter": {
>         "has_parent": {
>           "filter": {
>             "terms" : {
>               "subject._path.id" : [ "5-162" ]
>             }
>           },
>           "parent_type" : "series"
>         }
>       }
>     }
>   }
> }
>
> It takes constantly around 160 milliseconds to run and it returns finds 
> about 60k documents.
>
> If I search documents of type series (the parent) like this:
> {
>   "from" : 0,
>   "size" : 25,
>   "query" : {
>     "filtered": {
>       "query": { "match_all": {} },
>       "filter": {
>         "terms": {
>           "subject._path.id": [ "5-162" ]
>         }
>       }
>     }
>   }
> }
>
> It takes around 5 milliseconds and returns about 400 documents.
>
> The total count of program objects is about 1,7M and series objects 11k. 
> The index is fully optimized and the cluster is not doing anything else. 
> The index has 3 shards and 1 replica of each shard. There are three nodes 
> in the cluster. The nodes have twice the ram that is the index size. Half 
> of the ram is assigned to Elasticsearch. Elasticsearch version is 1.0. If I 
> use bigdesk plugin, it looks like there is more than enough ram. I'm not 
> seeing cache evictions or something like that.
>
> So for me it looks like there is something weird going on as the has 
> parent filter runs more than 30 times slower than the actual parent query. 
> Is there anything I can do to make it faster?
>
> Thanks,
> Lauri
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91c59820-c9e6-40fc-8f7f-b2ee1a4cd19e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to