I did some intensive tests last week on a 20-node cluster and had the 
following insights - I'd be interested if anyone has similar/dissimilar 
experience.
The had 20 nodes had 8 cores each, and 32GB memory each.  I set up 
Elasticsearch to have 15GB of that memory.
The sample events I was using were Apache logs (common format) without any 
additional fields (no geoip, useragent etc. plugins).
When running as a 20-node cluster, I got a maximum igestion rate of 2500k 
events/minute (41k/second), *but* the bottleneck was the logstash CPU 
load... so I reduced to a 10 node cluster...
With the 10 nodes I initially had 1600k/minute (27k) and acheived 
1800k/minute (30k/second) by increasing index_refresh_interval to 30s and 
index_buffer_size to 20%
Further reducing to 5 nodes, I had 1100k/minute (18k/second).
This brings me to an interesting comparison: at 10 nodes, I have 3k 
events/second/node, and with 5 nodes I have 3.66k events/second/node. i.e. 
the overhead for doubling the number of nodes from 5 to 10 is about 20%.
Is this to be expected?  Just how scalable is Elasticsearch - at what point 
is the diminishing return on adding nodes not cost effective?
Is the further logical reduction to 375 events/core/second still meaningful?

Cheers,
-Robin-

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3353dab5-6241-4b41-8845-6c5f8553d488%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to