While ES does compress by default, it also stores data in data structures, 
that increase the size of the data. The net is that your data will be much 
larger than the equivalent log file gzipped.  However, running logstash to 
ingest 1.5 years of logs may well take much longer than you would expect.

There is no reason you shouldn't be able to move snapshots off of your 
shared drive onto an external drive or other storage, such as S3.

One thing you should reconsider is what you are trying to do with your 
resources.  It sounds like it is simply too much.  If the budget cannot 
budge to accommodate the requirements, then the requirements must budge to 
accommodate the budget.  Perhaps you can identify some log sources that do 
not have the same retention requirements.  Perhaps it is some segment of 
your logs that is not as important.  For instance is it really important to 
keep that Java Stack trace from a year ago?  Now I don't know the nature of 
your logs, but I do know the nature of logs, and there are important log 
entries, and there are mundane repetitive entries.  What I am driving at is 
that leveraging the ability of using ES aliasing and cross index searching 
you can segment your logs into important indexes and not important.  You 
can still search across all the indexes, but you can establish retention 
policies which differ for the less important, while preserving the precious 
resources you have for the important.

Some data you can take an RRD style approach with and create indexes that 
have summary information in them which will allow you to generate 
historical dashboards that still capture the essence of the day, if not the 
detail.  For instance while you could not show the individual requests on a 
given day, you could still show the request volume over a three year period.

While this goes against the nature of the e logging efforts, these are some 
of the ideas I had while reading about your situation.

Aaron

On Monday, March 16, 2015 at 6:42:43 PM UTC-6, Mark Walkom wrote:
>
> There's not a lot you can do here unless you want to start uploading 
> snapshots to S3, or something else that is not on your NAS.
> ES does compress by default and we are working on using a better algorithm 
> for future releases which will help, but there's no ETA for that.
>
> On 16 March 2015 at 17:29, David Reagan <[email protected] <javascript:>> 
> wrote:
>
>> So, I haven't figured out the right search terms to find the answer via 
>> Google yet, I've read a lot of the docs on the subject of Snapshot and 
>> Restore without finding an answer, and I haven't had the time or resources 
>> to test some of my own ideas. Hence, I'm posting this in the hopes that 
>> someone who has already solved this problem will share. 
>>
>> How do you run ES with limited data storage space?
>>
>> Basically, short of getting more space, what can I do to make the best 
>> use of what I have, and still meet as many of my goals as possible?
>>
>> My setup is 4 data nodes. Due to lack of resources/money, they are all 
>> thin provisioned VMs, and all my data has to be on NFS/SAN mounts. Storing 
>> data on the actual VM's hard disk would negatively effect other VMs and 
>> services.
>>
>> Our NFS SAN is also low on space. So I only have about 1.5TB to use. 
>> Initially this seemed like plenty, but a couple weeks ago, ES started 
>> complaining about running out of space. Usage on that mount was over 80%. 
>> My snapshot repository had ballooned to over 700GB, and each node's data 
>> mount point was around 150GB. 
>>
>> Currently, I'm only using ES for logs.
>>
>> For day to day use, I should be fine with 1 month of open indices. Thus, 
>> I've been keeping older indices closed already. So I can't really do much 
>> more when it comes to closing indices.
>>
>> I also run the optimize command nightly on any logstash index older that 
>> a couple days.
>>
>> I'd just delete the really old data, but I have use cases for data up to 
>> 1.5 years old. Considering that snapshots of only a few months nearly used 
>> up all my space, and how much space a month of logs is currently taking up, 
>> I'm not sure how I can store that much data.
>>
>> So, in general, how would you solve my problem? I need to have immediate 
>> access to 1 months worth of logs (via Kibana), be able to relatively 
>> quickly access up to 6 months of logs (open closed indices?), and access up 
>> to 1.5 years worth temporarily (restore snapshots to new cluster on my 
>> desktop?)
>>
>> Would there be a way to move snapshots off of the NFS SAN to an external 
>> hard drive? 
>>
>> Should I tell logstash to send logs to a text file that get's logrotated 
>> for a year and a half? Or does ES do a good enough job with compression 
>> that gzipping wouldn't help? If it was just a text file, I could unzip it, 
>> then tell Logstash to read the file into an ES cluster.
>>
>> ES already compresses stored indices by default, right? So there's 
>> nothing I can do there?
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b694768f-3c71-4b98-a18c-842c95809734%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/b694768f-3c71-4b98-a18c-842c95809734%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db9d04c7-70d5-4810-899d-bc025c01ec21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to