The feedback on this post has been excellent - I had forgotten about the
possibility of ram disks as a solution.
We don't have the budget to dedicate large nodes to es/gl, and decided to
use a combo of older workstations as "slow" nodes and VM's on SSD as the
"fast" nodes. The workstation nodes worked out fairly well, with enough
room for three large SATA disks and 16g of ram. The original plan was to
configure the workstation nodes and see if they have enough gas to support
our incoming logs, then make SSD decisions once we had better data. After
spending a few days troubleshooting full output buffers on the graylog
side, we decided to go ahead and implement two SSD nodes to help with
incoming messages.
Anyway, hardware in use:
Two VM Graylog Servers (2.1.2)
-- Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 16 cores
-- 40G Ram, 20G Heap
Ten ElasticSearch Nodes (2.4.2)
- Six Workstation nodes
-- Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
-- 16g of ram (12G heap)
-- 18T raw space (three 6T drives)
- Two VM nodes with SSD backed storage
-- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
-- 16g of ram, 12G heap
-- 500g of SSD usable
- One dedicated master node
-- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
-- 16g of ram
We understand the heap should be about half our ram on the ES nodes, but at
8g ES would crash fairly often. 12g was the sweet spot for us.
We handle 5000-6000 msgs per second average today, expecting this to be
about 15,000 per second when the project is completed.
Graylog settings (not the whole set, just what we routinely look at first
when troubleshooting throughput):
output_batch_size = 800
output_flush_interval = 1
processbuffer_processors = 12
outputbuffer_processors = 16
processor_wait_strategy = blocking
ring_size = 1048576
inputbuffer_ring_size = 65536
inputbuffer_processors = 4
inputbuffer_wait_strategy = blocking
Elasticsearch settings:
## SSD Nodes
cluster.name: mycluster
node.name: ssdnode1
node.master: false
node.data: true
node.box_type: ssd
path.data: /elasticdata1,/elasticdata2
path.conf: /etc/elasticsearch
bootstrap.mlockall: true
network.host: 192.168.2.2
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
discovery.zen.ping.timeout: 60s
discovery.zen.ping.retries: 6
discovery.zen.ping.interval: 5s
threadpool.bulk.type: fixed
threadpool.bulk.size: 2
threadpool.bulk.queue_size: 500
indices.store.throttle.max_bytes_per_sec: "100mb" ## Allow SSD nodes to
write faster
## Workstation Nodes
cluster.name: mycluster
node.name: slownode1
node.master: false
node.data: true
node.box_type: slow
path.data: /elasticdata1,/elasticdata2,/elasticdata3
path.conf: /etc/elasticsearch
bootstrap.mlockall: true
network.host: 192.168.2.5
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
discovery.zen.ping.timeout: 60s
discovery.zen.ping.retries: 6
discovery.zen.ping.interval: 5s
index.refresh_interval: 60s ## This made our workstation nodes work
well.
indices.fieldata.cache.size: 5%
threadpool.bulk.type: fixed
threadpool.bulk.size: 2
threadpool.bulk.queue_size: 500
To achieve hot-warm, we tagged the fast nodes as "ssd" and added an index
template to elasticsearch so all new data would be created on those nodes.
Then we installed curator on our master node, and added a crontab entry
that runs a bash script each night.
To create the template:
curl -XPUT elasticsearch_node_in_your_cluster:9200/_template/graylog_1 -d '{
"template": "graylog2*",
"settings": {
"index.routing.allocation.require.box_type": "ssd"
}
}'
Contents of the bash script that runs curator:
#!/bin/bash
curator_cli --logfile /var/log/curator.log --loglevel INFO --logformat
default --host 192.168.2.10 --port 9200 allocation --key box_type --value
slow --filter_list
'{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":2}'
Roughly, find all the shards more than 2 days old and change them so they
must now be located on "slow" storage.
I would love to see hot-warm be handled by Graylog itself - it is a little
tedious.
Dustin Tennill
Eastern Kentucky University
On Saturday, December 3, 2016 at 10:13:51 AM UTC-5, Dustin Tennill wrote:
>
> All,
>
> We just finished implementing
> https://www.elastic.co/blog/hot-warm-architecture
> <https://www.elastic.co/blog/hot-warm-architecture?blade=tw> for our
> Graylog environment. After weeks of troubleshooting elasticsearch
> performance issues with our budget ES nodes, the addition of a two small
> SSD nodes REALLY made a difference. Our output buffers had been filling up
> from time to time, and this appears to have resolved that issue.
>
> If anyone is interested, we will post our config information.
>
> Dustin Tennill
> EKU
>
>
--
You received this message because you are subscribed to the Google Groups
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/graylog2/a42b150b-766d-4ab1-b89a-f33308fc2ff7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.