How do you run ES with limited data storage space?

David Reagan Mon, 16 Mar 2015 17:30:07 -0700

So, I haven't figured out the right search terms to find the answer via 
Google yet, I've read a lot of the docs on the subject of Snapshot and 
Restore without finding an answer, and I haven't had the time or resources 
to test some of my own ideas. Hence, I'm posting this in the hopes that 
someone who has already solved this problem will share.

Basically, short of getting more space, what can I do to make the best use
of what I have, and still meet as many of my goals as possible?

My setup is 4 data nodes. Due to lack of resources/money, they are all thin
provisioned VMs, and all my data has to be on NFS/SAN mounts. Storing data
on the actual VM's hard disk would negatively effect other VMs and services.

Our NFS SAN is also low on space. So I only have about 1.5TB to use.
Initially this seemed like plenty, but a couple weeks ago, ES started
complaining about running out of space. Usage on that mount was over 80%.
My snapshot repository had ballooned to over 700GB, and each node's data
mount point was around 150GB.

Currently, I'm only using ES for logs.

For day to day use, I should be fine with 1 month of open indices. Thus,
I've been keeping older indices closed already. So I can't really do much
more when it comes to closing indices.

I also run the optimize command nightly on any logstash index older that a
couple days.

I'd just delete the really old data, but I have use cases for data up to
1.5 years old. Considering that snapshots of only a few months nearly used
up all my space, and how much space a month of logs is currently taking up,
I'm not sure how I can store that much data.

So, in general, how would you solve my problem? I need to have immediate
access to 1 months worth of logs (via Kibana), be able to relatively
quickly access up to 6 months of logs (open closed indices?), and access up
to 1.5 years worth temporarily (restore snapshots to new cluster on my
desktop?)

Would there be a way to move snapshots off of the NFS SAN to an external
hard drive?

Should I tell logstash to send logs to a text file that get's logrotated
for a year and a half? Or does ES do a good enough job with compression
that gzipping wouldn't help? If it was just a text file, I could unzip it,
then tell Logstash to read the file into an ES cluster.

ES already compresses stored indices by default, right? So there's nothing
I can do there?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b694768f-3c71-4b98-a18c-842c95809734%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How do you run ES with limited data storage space?

Reply via email to