Good point on heap, so I will bring that back down to 30GB Versions: ES 1.3.2-1 java 1.7.0_67
I definitely want to start using all 12 disks, rather than the 1 at the moment! If I add paths for the other 11 disks and restart, will ES do any 'rebalancing'? If it won't then is there any way to move the data around all 12 disks? I really don't want to re-index everthing!! Thanks On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote: > > Also given you're over 32GB heap your java pointers aren't going to be > compressed, which means GC will suffer. > > You haven't mentioned what ES and java versions you are using, which would > be useful. > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: [email protected] <javascript:> > web: www.campaignmonitor.com > > On 18 September 2014 18:57, Michael McCandless <[email protected] > <javascript:>> wrote: > >> Try disabling merge IO throttling, especially if your index is on SSD/s. >> (It's on by default at a paltry 20 MB/sec). Merge IO throttling causes >> merges to run slowly which eventually causes them to back up enough to the >> point where indexing must be throttled... >> >> Also see the recent post about tuning to favor indexing throughput: >> http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/ >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Sep 18, 2014 at 4:54 AM, <[email protected] <javascript:>> >> wrote: >> >>> Setup: >>> 4 nodes >>> Replication = 0 >>> ES_HEAP_SIZE = 75GB >>> Number of Indices = 59 (using logstash one index per month) >>> Total shards = 234 (each index is 4 hards, one per node) >>> Total docs = 7.4 billion >>> Total size = 4.7TB >>> >>> When I add a new file, which I do using logstash on all four nodes, the >>> indexing immediately throttles. For instance: >>> >>> [2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [ >>> logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, >>> maxNumMerges=5 >>> [2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13] >>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, >>> maxNumMerges=5 >>> [2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13] >>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, >>> maxNumMerges=5 >>> [2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13] >>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, >>> maxNumMerges=5 >>> [2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13] >>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, >>> maxNumMerges=5 >>> [2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13] >>> [logstash-2014.09][2] now t >>> >>> Where should I be looking to tuning the indexing performance? The query >>> load on the cluster is very low as it is a research cluster and so I would >>> sacrifice query performance for indexing. >>> >>> The 4 nodes all run logstash, listening one various ports. I use netcat >>> to 'feed' the data to the 4 nodes from a hadoop cluster. >>> >>> hadoop1 netcat --------> >>> hadoop2 netcat --------> ES1 >>> hadoop3 netcat --------> >>> >>> And so on. >>> >>> Each ES node has 24 disks but I am only using one at the moment. This is >>> an obvious IO bottleneck, but I am unclear how to use all disks? If I add >>> more disks with ES share the data between them all? eg; /mnt/disk1 >>> /mnt/disk2 etc >>> >>> Thanks >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
