there is not that much you can really do here parent/child queries tend to be very slow & eat a lot of heap space
i had similar performance problem in my case i had 3 level relationship (parent/child/grandchild) and query time was in average x10 slower for every level so my suggestion will be to switch to using nested documents + update api if your query time is more important than update time, that will be the way to go (in my case query performance improvement was x100 times) http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html Regards, Karol Gwaj On Sunday, March 30, 2014 8:28:33 AM UTC+1, Lauri wrote: > > Hi, > > I'm having performance problems with has parent filter. > > The for the child document is: > { > "program": { > "_parent": { "type": "series" }, > ... > } > } > > And for the parent document: > { > "series": { > ... > "properties": { > ... > "subject":{ > "type": "object", > "properties": { > ... > "_path": { > "type": "object", > "properties": { > "id": { "type": "string", "analyzer": "path_analyzer" } > ... > } > } > } > }, > ... > } > } > } > > If I search documents of type program (the child) like this: > { > "from": 0, > "size": 25, > "query": { > "filtered": { > "query": { "match_all": {} }, > "filter": { > "has_parent": { > "filter": { > "terms" : { > "subject._path.id" : [ "5-162" ] > } > }, > "parent_type" : "series" > } > } > } > } > } > > It takes constantly around 160 milliseconds to run and it returns finds > about 60k documents. > > If I search documents of type series (the parent) like this: > { > "from" : 0, > "size" : 25, > "query" : { > "filtered": { > "query": { "match_all": {} }, > "filter": { > "terms": { > "subject._path.id": [ "5-162" ] > } > } > } > } > } > > It takes around 5 milliseconds and returns about 400 documents. > > The total count of program objects is about 1,7M and series objects 11k. > The index is fully optimized and the cluster is not doing anything else. > The index has 3 shards and 1 replica of each shard. There are three nodes > in the cluster. The nodes have twice the ram that is the index size. Half > of the ram is assigned to Elasticsearch. Elasticsearch version is 1.0. If I > use bigdesk plugin, it looks like there is more than enough ram. I'm not > seeing cache evictions or something like that. > > So for me it looks like there is something weird going on as the has > parent filter runs more than 30 times slower than the actual parent query. > Is there anything I can do to make it faster? > > Thanks, > Lauri > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91c59820-c9e6-40fc-8f7f-b2ee1a4cd19e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
