Setup:
4 nodes
Replication            = 0
ES_HEAP_SIZE   = 75GB
Number of Indices = 59  (using logstash one index per month)
Total shards          = 234 (each index is 4 hards, one per node)
Total docs             = 7.4 billion
Total size               = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the 
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal    ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal    ] [hdp13] 
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, 
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal    ] [hdp13] 
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal    ] [hdp13] 
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, 
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal    ] [hdp13] 
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal    ] [hdp13] 
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query 
load on the cluster is very low as it is a research cluster and so I would 
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat to 
'feed' the data to the 4 nodes from  a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat -------->   ES1     
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This is an 
obvious IO bottleneck, but I am unclear how to use all disks? If I add more 
disks with ES share the data between them all? eg; /mnt/disk1 /mnt/disk2 etc

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to