Reducing shards won't help, you will still have the same amount of data, it's just won't be sharded as much. It's possible that using a dedupe FS will help as logs data can be repetitive, but you'd really have to try it to see. Zipping is an option, you could close older indexes and then zip them, and doing the reverse to be able to read them when you want. However that adds a lot of complexity and delay, which you might be ok with.
Ultimately, you won't get that sort of compression factor using the inbuilt ES functionality, we saw about a 30-50% reduction in size when we enabled compression in 0.90.N, but 70%+ is a big ask. Your best option would probably be to not store the _source. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: [email protected] web: www.campaignmonitor.com On 22 April 2014 18:25, horst knete <[email protected]> wrote: > Hey Guys, > > First of all our Setup of Elastisearch: > > - 1 Node > - 16 GB Ram > - 4 CPU > - Version 0.9.7 > - 5 Shards , 1 Replica > - Type of Logs: WinEvent-Logs, Unix-System Logs, Cisco-Device-Logs, > Firewall-Logs etc. > - About 3 Million Logs per day > > Using Logasth to collect Logs and Kibana to access it. > > Today we started inserting our Netflow into Elasticsearch. In Fact we have > a big Production Environment so what we got were about 25000 Logs per > Second inserting into Elasticsearch. > > It was no Problem for the System to manage this much Load but the Index > grows pretty fast and after 1 hour of testing we got 800 MB of Data(This > would be 19.2 GB of Data per Day and with a Log retention of 30 Day 576 GB > of Data) > > Because this much Data is unacceptable for our System i really would like > to have ways to reduce the Disk Space Requirements. > > I´ve tried reducing the Disk Space by using the Compression Method inbuilt > in Elasticsearch, setting _source to compress. Unfortunate this doesnt > helped much. > > Also tried to use the _optimze command since someone wrote this would help > reducing the Disk Space - Had no effect. > > The Goal is to reduce the 576 GB of Data to sth about 80-100 GB. > > The First Thing what i could do is to reduce the Number of Shards to 2 > which would reduce the Space Storage to about 220 GB. But i really doesnt > want to do that in case we add more nodes to the System. > > The Next Thing i thinked about was adding a Deduplication File System to > the ES-Node, but i dont think that De-Dup has much effect on a ES-Index - > Any Experience in using that? > > The last and the most obvious Thing is to Zip the Indices to Tarball or > .zip. I think thats our Solution for Long-Term Storage (up to 2 Years) but > its no solution for active Indices(The 30 days) since they would not be > searchable by Kibana. > > So any of you Guys have Suggestion for us? > > Cheers > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a5e95978-cadd-4953-98b1-52af9a8c84ce%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a5e95978-cadd-4953-98b1-52af9a8c84ce%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aSuNvrgq6zAFtf5Xv4yvodj%3DyFooiCPpP-iyMVOOT-gA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
