Hi,
I like to share my experience and in the same time hope I can get some
tips.
The query was run against an index with about 700 million documents.
Two things happens,
1. The node run this query crashed. It is the node configured not to
proccess data.
2. The data nodes start crazy on GC. eventually old generation gc cannot
reduce the heep usage and the nodes becomes unresponsive. in some cases.
OLD generation gc even increased size of the heap:
*2014-12-20 07:21:03,370][WARN ][monitor.jvm ] [******]
[gc][young][2796041][224976] duration [1.1s], collections [1]/[1.3s], total
[1.1s]/[3.4h], memory [21.5gb]->[21.2gb]/[29.8gb], all_pools {[young]
[1.4gb]->[3.4mb]/[1.4gb]}{[survivor]
[191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.9gb]->[21gb]/[28.1gb]}*
It is a bad query by itself. But I expected ES cluster handles it
gracefully. It does throw this exception:
* Caused by: org.elasticsearch.common.breaker.CircuitBreakingException:
[FIELDDATA] Data too large, data for [_uid] would be larger than limit of
[19206989414/17.8gb]*
I guess ES stopped at some point because field data exceeds the default
limit. But it is too late to stop the query that caused heap memory issue.
I am wondering if there is any obvious wrong with my ES cluster
configuration.
I have 5 box eah with 125 ram and 32 cores. I deploy two data nodes on each
of them the heap fixed at 31G and configuration is favor bulk ingesting. I
actually saw above 60+K document ingesting through put per second. It was
working fine until that query comes.
Thanks,
Jack
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae1b7ea6-d801-4d67-b047-69ab54f1f38b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.