[graylog2] Unable to connect to node after attempting to expand data storage space

Stephen Fox Tue, 05 May 2015 11:12:18 -0700

Just wanted to post my experiences in case someone else runs into the same 
issues I had this morning. I applogize for my clumsy post, I'm a noob with 
graylog.


I was running out of space on the graylog2 VM I've been working with.

"In order to extend the disk space mount a second drive on this path. Make 
sure to move old data to the new drive before and give the graylog user 
permissions to read and write here."
https://github.com/Graylog2/graylog2-images/tree/master/ova

Reading this, I added a 100GB additional drive in VMware. Partitioned it 
and mounted it. Ran rsync -av from /var/opt/graylog/data to /mnt/graydata 
(my 100gb drive). I realized I should've stopped graylog services first, so 
I stopped them and rsynced again.

I renamed the original data directory to data_backup I then mounted the 
100gb drive on /var/opt/graylog/data/. When I restarted graylog, it 
couldn't connect to the node. I messed around reconfiguring and restarting 
a number of times. No luck getting it working.

Another method I read about was adding another drive to the LVM.
https://groups.google.com/d/msg/graylog2/bpcVTlIN8UA/9zUaE9Gpx2UJ
"Since official OVA images are configured to use LVM you can just add 
create a new disk image in virtualbox (if you use it) add it to VM, boot 
it, and add new hd to LVM volume, then increase root partition size"
http://askubuntu.com/questions/458476/adding-disks-with-lvm
I removed the data from the 100GB drive, and renamed the data_backup back 
to data. After some hacking around, I was able to add the drive and expand 
the space to about 115GB

Thing still weren't happy. I was able to find some elasticsearch queries to 
check health.
curl -XGET 'http://<serverip>:9200/_cluster/health?pretty=true'
curl -XGET 'http://<serverip>:9200/_cluster/health?level=indices' | grep 
status

I noted that the graylog cluster status was red. The one indice was red and 
the rest were yellow. I deleted that red indice.

At that point the node was accessible again and I was able to get into the 
graylog web interface. All seemed well until I noted that no messages were 
appearing in the last 5 mins.

It seems there were indexer failures. Every message that came in was 
causing an indexer error. I had 100,000+ of them.
graylog_7    false    5c19f5db-f33c-11e4-9c7a-000c29d9b316    
ClusterBlockException[blocked by: [FORBIDDEN/8/index write (api)];]

I figured it wasn't able to write to that indice. Basically I was able to 
get things working again by changing the retention to something like 1 
indice every 24hrs. Once I reconfigured, it seemed to be writing to a new 
indice (graylog_9) and started logging just fine w/o indexer failures. I 
then changed it back to a space related retention setting.

Things seem to be working well. I hope this helps someone. And, if someone 
can make a suggestion about what I should have done differently, that would 
be helpful.

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Unable to connect to node after attempting to expand data storage space

Reply via email to