I'd just like to chime in with a "me too". Is the answer just more nodes? In my case this is happening every week or so.
On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote: > > My dataset currently is 100GB across a few "daily" indices (~5-6GB and 15 > shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap). > > > On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom <[email protected] > <javascript:>> wrote: > > How big are your data sets? How big are your nodes? > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: [email protected] <javascript:> > web: www.campaignmonitor.com > > > On 22 April 2014 00:32, Brian Flad <[email protected] <javascript:>> > wrote: > > We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min > master), and 5 data nodes. Interestingly, we see the repeated young GCs > only on a node or two at a time. Cluster operations (such as recovering > unassigned shards) grinds to a halt. After restarting a GCing node, > everything returns to normal operation in the cluster. > > Brian F > > > On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom <[email protected] > <javascript:>> wrote: > > In both your instances, if you can, have 3 master eligible nodes as it > will reduce the likelihood of a split cluster as you will always have a > majority quorum. Also look at discovery.zen.minimum_master_nodes to go with > that. > However you may just be reaching the limit of your nodes, which means the > best option is to add another node (which also neatly solves your split > brain!). > > Ankush it would help if you can update java, most people recommend u25 but > we run u51 with no problems. > > > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: [email protected] <javascript:> > web: www.campaignmonitor.com > > > On 17 April 2014 07:31, Dominiek ter Heide <[email protected] > <javascript:>> wrote: > > We are seeing the same issue here. > > Our environment: > > - 2 nodes > - 30GB Heap allocated to ES > - ~140GB of data > - 639 indices, 10 shards per index > - ~48M documents > > After starting ES everything is good, but after a couple of hours we see > the Heap build up towards 96% on one node and 80% on the other. We then see > the GC take very long on the 96% node: > > > > > > > > > > TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125: > 9300]]]) > > [2014-04-16 12:04:27,845][INFO ][discovery ] > [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA > > [2014-04-16 12:04:27,850][INFO ][http ] [ > elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, > publish_address {inet[/192.99.45.126:9200]} > > [2014-04-16 12:04:27,851][INFO ][node ] > [elasticsearch2.trend1] started > > [2014-04-16 12:04:32,669][INFO ][indices.store ] > [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec > from [20mb] to [1gb], note, type is [MERGE] > > [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] > [elasticsearch2.trend1] updating > [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] > to [50] > > [2014-04-16 12:04:32,670][INFO ][indices.recovery ] > [elasticsearch2.trend1] updating [indices.recovery.max_bytes_per_sec] from > [200mb] to [2gb] > > [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] > [elasticsearch2.trend1] updating > [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] > to [50] > > [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] > [elasticsearch2.trend1] updating > [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] > to [50] > > [2014-04-16 15:25:21,409][WARN ][monitor.jvm ] > [elasticsearch2.trend1] [gc][old][11876][106] duration [1.1m], > collections [1]/[1.1m], total [1.1m]/[1.4m], memory [28.7gb]->[22gb]/[ > 29.9gb], all_pools {[young] [67.9mb]->[268.9mb]/[665.6mb]}{[survivor] [ > 60.5mb]->[0b]/[83.1mb]}{[old] [28.6gb]->[21.8gb]/[29.1gb]} > > [2014-04-16 16:02:32,523][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][13996][144] duration [1.4m], collections > [1]/[1.4m], total [1.4m]/[3m], memory [28.8gb]->[23.5gb]/[29.9gb], > all_pools {[young] [21.8mb]->[238.2mb]/[665.6mb]}{[survivor] [82.4mb]->[0b > ]/[83.1mb]}{[old] [28.7gb]->[23.3gb]/[29.1gb]} > > [2014-04-16 16:14:12,386][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][14603][155] duration [1.3m], collections > [2]/[1.3m], total [1.3m]/[4.4m], memory [29.2gb]->[23.9gb]/[29.9gb], > all_pools {[young] [289mb]->[161.3mb]/[665.6mb]}{[survivor] [58.3mb]->[0b > ]/[83.1mb]}{[old] [28.8gb]->[23.8gb]/[29.1gb]} > > [2014-04-16 16:17:55,480][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][14745][158] duration [1.3m], collections > [1]/[1.3m], total [1.3m]/[5.7m], memory [29.7gb]->[24.1gb]/[29.9gb], > all_pools {[young] [633.8mb]->[149.7mb]/[665.6mb]}{[survivor] [68.6mb]->[ > 0b]/[83.1mb]}{[old] [29gb]->[24gb]/[29.1gb]} > > [2014-04-16 16:21:17,950][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][14857][161] duration [1.4m], collections > [1]/[1.4m], total [1.4m]/[7.2m], memory [28.6gb]->[24.5gb]/[29.9gb], > all_pools {[young] [27.7mb]->[154.8mb]/[665.6mb]}{[survivor] [83.1mb]->[0b > ]/[83.1mb]}{[old] [28.5gb]->[24.3gb]/[29.1gb]} > > [2014-04-16 16:24:48,776][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][14978][164] duration [1.4m], collections > [1]/[1.4m], total [1.4m]/[8.6m], memory [29.4gb]->[24.7gb]/[29.9gb], > all_pools {[young] [475.5mb]->[125.1mb]/[665.6mb]}{[survivor] [68.9mb]->[ > 0b]/[83.1mb]}{[old] [28.9gb]->[24.6gb]/[29.1gb]} > > [2014-04-16 16:26:54,801][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][15021][165] duration [1.3m], collections > [1]/[1.3m], total [1.3m]/[9.9m], memory [29.3gb]->[24.8gb]/[29.9gb], > all_pools {[young] [391.8mb]->[151.1mb]/[665.6mb]}{[survivor] [62.4mb]->[ > 0b]/[83.1mb]}{[old] [28.9gb]->[24.6gb]/[29.1gb]} > > [2014-04-16 16:30:45,393][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][15170][168] duration [1.3m], collections > [1]/[1.3m], total [1.3m]/[11.3m], memory [29.4gb]->[24.6gb]/[29.9gb], > all_pools {[young] [320.3mb]->[186.7mb]/[665.6mb]}{[survivor] [75.7mb]->[ > 0b]/[83.1mb]}{[old] [29gb]->[24.4gb]/[29.1gb]} > > [2014-04-16 16:32:57,505][WARN ][monitor.jvm ] [ > elasticsearch2.trend1] [gc][old][15221<span style="c > > ... -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/718203bc-9a99-4dce-8723-0c6000454d83%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
