I have an index of 50 million docs with 2 shards. I am running a match-all 
query as the initial top-level query  that has 8 Terms Aggregation filters, 
one of which has a high cardinality value - 10k. The rest of the 
aggregations are all < 10 cardinality. Then a drill-down query is run with 
a post filter. 

The top-level query has a hit count of 50 million and the drill down query 
has a hit count of 20 million.

Both the shards are on a single node. I am using a Transport node and java 
apis to run the searches.

When I run the java client from the data node, and the system is cold I am 
getting a 3 minute latency of the top level query while the drill down has 
a latency of 800 ms

When the system is warm, I am getting a latency of 25 secs of the top-level 
query, while the drill down remain the same at around 800 ms.

When the system is cold, if I run the same client from a remote system, the 
latency of the top-level query is 21 secs, while the subsequent queries 
drop down to 8 secs.

I am running the queries with the aggregation filter size set to 0 since we 
need the exact count.

I understand that the high cardinality filter is slowing the queries and 
the spike in the CPU is for calculating the count. 

I would like to understand why the search latency is markedly less when 
running the search client from a remote system - the "cold" latency value 
is 21 secs using a remote client vs 3 mins on the client running on the 
data node. This is when nothing else is running on the data node.

Also, I would like to understand if we can tune the duration of the cache. 
I see the same latency when I re-run the query after the system is idle for 
some time - hence the cache must be getting cleared after a set time-period.

Finally, is there any other type of aggregation filter (other than the 
Terms Aggregation) that is recommended for high cardinality aggregation 
items so as to bring down the latency?

Thanks for any pointers.
Shantanu

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f286dbeb-c398-48dd-8c0c-a1cb2a3f884e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to