Re: Indexing is being throttled

Michael McCandless Thu, 18 Sep 2014 01:58:30 -0700

Try disabling merge IO throttling, especially if your index is on SSD/s.
 (It's on by default at a paltry 20 MB/sec).  Merge IO throttling causes
merges to run slowly which eventually causes them to back up enough to the
point where indexing must be throttled...


Also see the recent post about tuning to favor indexing throughput:
http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/

Mike McCandless

http://blog.mikemccandless.com


On Thu, Sep 18, 2014 at 4:54 AM, <[email protected]> wrote:

> Setup:
> 4 nodes
> Replication            = 0
> ES_HEAP_SIZE   = 75GB
> Number of Indices = 59  (using logstash one index per month)
> Total shards          = 234 (each index is 4 hards, one per node)
> Total docs             = 7.4 billion
> Total size               = 4.7TB
>
> When I add a new file, which I do using logstash on all four nodes, the
> indexing immediately throttles. For instance:
>
> [2014-09-18 09:41:42,326][INFO ][index.engine.internal    ] [hdp13] [
> logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
> maxNumMerges=5
> [2014-09-18 09:41:45,267][INFO ][index.engine.internal    ] [hdp13]
> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
> maxNumMerges=5
> [2014-09-18 09:41:45,303][INFO ][index.engine.internal    ] [hdp13]
> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
> maxNumMerges=5
> [2014-09-18 09:41:51,273][INFO ][index.engine.internal    ] [hdp13]
> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
> maxNumMerges=5
> [2014-09-18 09:41:51,379][INFO ][index.engine.internal    ] [hdp13]
> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
> maxNumMerges=5
> [2014-09-18 09:42:06,429][INFO ][index.engine.internal    ] [hdp13]
> [logstash-2014.09][2] now t
>
> Where should I be looking to tuning the indexing performance? The query
> load on the cluster is very low as it is a research cluster and so I would
> sacrifice query performance for indexing.
>
> The 4 nodes all run logstash, listening one various ports. I use netcat to
> 'feed' the data to the 4 nodes from  a hadoop cluster.
>
> hadoop1 netcat -------->
> hadoop2 netcat -------->   ES1
> hadoop3 netcat -------->
>
> And so on.
>
> Each ES node has 24 disks but I am only using one at the moment. This is
> an obvious IO bottleneck, but I am unclear how to use all disks? If I add
> more disks with ES share the data between them all? eg; /mnt/disk1
> /mnt/disk2 etc
>
> Thanks
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing is being throttled

Reply via email to