Re: Bulk indexing via ES HTTP

[email protected] Tue, 28 Jan 2014 01:19:57 -0800

You should consider if it is possible to install the latest ES (current
1.0.0.RC1) and the latest JVM.

If you use 4 nodes, you should consider 4 shards by default, for balanced
resources on every index.

You did not set anything special for bulk indexing thread pool if you mean
that, the settings are in threadpool.bulk, not threadpool.index (I don't
know if your Logstash is using bulk or index)

indices.memory.index_buffer_size is adjusted automatically, no need to cap
it to 50%.

Also index.translog.flush_threshold_ops, I wonder why you adjust that value.

By moving the search pool away from the number of CPU cores, you reduce the
automatic scale of search in your cluster which is bad. Using 20 instead of
18 (3*6 is default) makes not much difference per se. But reducing the
queue size from 1000 to 100 will make your search load bail out early and
often.

Your heap size is very large (30g) and you should be prepared that you have
to take additional efforts to tackle GC challenges.

You should also think about dedicated master nodes if you want to drive
large heaps with expected high GC on data nodes.

The indexing load is automatically distributed, no need to care for that in
Logstash. But you should consider to set up Logstash so that it can index
to more than one node, just for more resiliency.

Jörg

On Tue, Jan 28, 2014 at 9:51 AM, Luca Belluccini
<[email protected]>wrote:

> Hello,
> I am putting in place an ES cluster with 4 nodes (6 Cores + 48GB RAM).
> The aim is to use Kibana as a data analysis tool.
> I set up Logstash to properly feed ES and use the following:
>
>    - https://gist.github.com/lucabelluccini/7563998 for index templates
>    - Some tweaks to elasticsearch.yml:
>       - indices.memory.index_buffer_size: 50%
>       - index.translog.flush_threshold_ops: 50000
>       - index.number_of_shards: 3
>       - threadpool.search.type: fixed
>       - threadpool.search.size: 20
>       - threadpool.search.queue_size: 100
>       - threadpool.index.type: fixed
>       - threadpool.index.size: 60
>       - threadpool.index.queue_size: 200
>       - node.master: true
>       - node.data: true
>       - ES_HEAP_SIZE=30g
>
> Logstash is sending to one of the hosts and I wanted to ask if the
> indexing is automatically distributed over all the nodes or you have to set
> up something to exploit all the processing power of all the 4 nodes.
>
> Thanks in advance,
> Luca B.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/15f9d547-0d78-48bb-bb33-c18d88e78687%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8tvmcty8HMxyJogOWW-L5wL%3D3sQjtuR-A3r8o1r%2BwCg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk indexing via ES HTTP

Reply via email to