The feedback on this post has been excellent - I had forgotten about the 
possibility of ram disks as a solution. 

We don't have the budget to dedicate large nodes to es/gl, and decided to 
use a combo of older workstations as "slow" nodes and VM's on SSD as the 
"fast" nodes. The workstation nodes worked out fairly well, with enough 
room for three large SATA disks and 16g of ram. The original plan was to 
configure the workstation nodes and see if they have enough gas to support 
our incoming logs, then make SSD decisions once we had better data. After 
spending a few days troubleshooting full output buffers on the graylog 
side, we decided to go ahead and implement two SSD nodes to help with 
incoming messages. 

Anyway, hardware in use: 
Two VM Graylog Servers (2.1.2)
-- Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 16 cores 
-- 40G Ram, 20G Heap

Ten ElasticSearch Nodes (2.4.2)
- Six Workstation nodes
-- Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
-- 16g of ram (12G heap)
-- 18T raw space (three 6T drives)
- Two VM nodes with SSD backed storage
-- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
-- 16g of ram, 12G heap
-- 500g of SSD usable
- One dedicated master node
-- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
-- 16g of ram

We understand the heap should be about half our ram on the ES nodes, but at 
8g ES would crash fairly often. 12g was the sweet spot for us. 
We handle 5000-6000 msgs per second average today, expecting this to be 
about 15,000 per second when the project is completed. 

Graylog settings (not the whole set, just what we routinely look at first 
when troubleshooting throughput):
    output_batch_size = 800
    output_flush_interval = 1
    processbuffer_processors = 12 
    outputbuffer_processors = 16
    processor_wait_strategy = blocking
    ring_size = 1048576 
    inputbuffer_ring_size = 65536
    inputbuffer_processors = 4
    inputbuffer_wait_strategy = blocking

Elasticsearch settings: 

## SSD Nodes
    cluster.name: mycluster
    node.name: ssdnode1
    node.master: false
    node.data: true
    node.box_type: ssd 
    path.data: /elasticdata1,/elasticdata2
    path.conf: /etc/elasticsearch
    bootstrap.mlockall: true
    network.host: 192.168.2.2
    transport.tcp.port: 9300
    http.port: 9200

    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
    discovery.zen.ping.timeout: 60s
    discovery.zen.ping.retries: 6
    discovery.zen.ping.interval: 5s

    threadpool.bulk.type: fixed
    threadpool.bulk.size: 2
    threadpool.bulk.queue_size: 500  
    indices.store.throttle.max_bytes_per_sec: "100mb" ## Allow SSD nodes to 
write faster

## Workstation Nodes
    cluster.name: mycluster
    node.name: slownode1
    node.master: false 
    node.data: true 
    node.box_type: slow 
    path.data: /elasticdata1,/elasticdata2,/elasticdata3
    path.conf: /etc/elasticsearch
    bootstrap.mlockall: true
    network.host: 192.168.2.5
    transport.tcp.port: 9300
    http.port: 9200
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
    discovery.zen.ping.timeout: 60s
    discovery.zen.ping.retries: 6
    discovery.zen.ping.interval: 5s
    index.refresh_interval: 60s ## This made our workstation nodes work 
well. 
    indices.fieldata.cache.size: 5%
    threadpool.bulk.type: fixed
    threadpool.bulk.size: 2
    threadpool.bulk.queue_size: 500

To achieve hot-warm, we tagged the fast nodes as "ssd" and added an index 
template to elasticsearch so all new data would be created on those nodes. 
Then we installed curator on our master node, and added a crontab entry 
that runs a bash script each night. 

To create the template: 

curl -XPUT elasticsearch_node_in_your_cluster:9200/_template/graylog_1 -d '{
  "template": "graylog2*",
  "settings": {
    "index.routing.allocation.require.box_type": "ssd"
  }
}'

Contents of the bash script that runs curator: 
#!/bin/bash
curator_cli --logfile /var/log/curator.log --loglevel INFO --logformat 
default --host 192.168.2.10 --port 9200 allocation --key box_type --value 
slow --filter_list 
'{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":2}'

Roughly, find all the shards more than 2 days old and change them so they 
must now be located on "slow" storage. 

I would love to see hot-warm be handled by Graylog itself - it is a little 
tedious. 

Dustin Tennill
Eastern Kentucky University


On Saturday, December 3, 2016 at 10:13:51 AM UTC-5, Dustin Tennill wrote:
>
> All,
>
> We just finished implementing 
> https://www.elastic.co/blog/hot-warm-architecture 
> <https://www.elastic.co/blog/hot-warm-architecture?blade=tw> for our 
> Graylog environment. After weeks of troubleshooting elasticsearch 
> performance issues with our budget ES nodes, the addition of a two small 
> SSD nodes REALLY made a difference. Our output buffers had been filling up 
> from time to time, and this appears to have resolved that issue. 
>
> If anyone is interested, we will post our config information. 
>
> Dustin Tennill
> EKU
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/a42b150b-766d-4ab1-b89a-f33308fc2ff7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to