Thomas,

Thanks for your insights and experiences. As I am someone who has explored 
and used ES for over a year but is relatively new to the ELK stack, your 
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all 
field. The entire text of the log events generated by logstash ends up in 
the message field (and not @message as many people incorrectly post). So 
the _all field is just redundant overhead with no value add. The result is 
a dramatic drop in database file sizes and dramatic increase in load 
performance. Of course, you need to configure ES to use the message field 
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on 
the front line of a brand new product with a smart and dedicated 
development team working steadily to improve the product. Six months ago, 
the ELK stack eluded me and reports weren't encouraging (with the sole 
exception of the Kibana web site's marketing pitch). But ES has come a long 
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and 
prevent external (to the Splunk db itself, not to our company) users from 
causing harm to data. But Kibana seems to be meant for a small cadre of 
trusted users. What if I write a dashboard with the same name as someone 
else's? Kibana doesn't even begin to discuss user isolation. But I am 
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND 
instead of OR. Google is not my friend: I keep getting references to the 
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and 
promising, but it has a long way to go for deployment to all of the folks 
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has 
been an excellent tool for prototyping. The book has been invaluable in 
helping me extract dates from log events and handling all of our different 
multiline events. But it still doesn't explain why the date filter needs a 
different array of matching strings to get the date that the grok filter 
has already matched and isolated. And recommendations to avoid the 
elasticsearch_http output and use elasticsearch (via the Node client) 
directly contradict the fact that logstash's 1.1.1 version of the ES client 
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it 
with Perl and Apache Flume (already in use) and pipe it into my Java bulk 
load tool (which is always kept up-to-date with the versions of ES we 
deploy!!). Because we send the data via Flume to our data warehouse, any 
losses in ES will be annoying but won't be catastrophic. And the front-end 
following of rotated log files will be done using the GNU *tail -F* command 
and option. This GNU tail command with its uppercase -F option follows 
rotated log files perfectly. I doubt that logstash can do the same, and we 
currently see that neither can Splunk (so we sporadically lose log events 
in Splunk too). So GNU tail -F piped into logstash with the stdin filter 
works perfectly in my evaluation setup and will likely form the first stage 
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
>
> We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
> Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
> The system is slow but ok to use. 
>
> We tried Elasticsearch and we were able to get the same performance with 
> the same amount of machines. Unfortunately with Elasticsearch you need 
> almost double amount of storage, plus a LOT of patience to make is run. It 
> took us six months to set it up properly, and even now, the system is quite 
> buggy and instable and from time to time we loose data with Elasticsearch. 
>
> I don´t recommend ELK for a critical production system, for just dev work, 
> it is ok, if you don´t mind the hassle of setting up and operating it. The 
> costs you save by not buying a splunk license you have to invest into 
> consultants to get it up and running. Our dev teams hate Elasticsearch and 
> prefer Splunk.
>

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
>
> We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
> Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
> The system is slow but ok to use. 
>
> We tried Elasticsearch and we were able to get the same performance with 
> the same amount of machines. Unfortunately with Elasticsearch you need 
> almost double amount of storage, plus a LOT of patience to make is run. It 
> took us six months to set it up properly, and even now, the system is quite 
> buggy and instable and from time to time we loose data with Elasticsearch. 
>
> I don´t recommend ELK for a critical production system, for just dev work, 
> it is ok, if you don´t mind the hassle of setting up and operating it. The 
> costs you save by not buying a splunk license you have to invest into 
> consultants to get it up and running. Our dev teams hate Elasticsearch and 
> prefer Splunk.
>
> Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
>>
>> That's a lot of data! I don't know of any installations that big but 
>> someone else might.
>>
>> What sort of infrastructure are you running splunk on now, what's your 
>> current and expected retention?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: [email protected]
>> web: www.campaignmonitor.com
>>
>>
>> On 19 April 2014 07:33, Frank Flynn <[email protected]> wrote:
>>
>>> We have a large Splunk instance.  We load about 1.25 Tb of logs a day. 
>>>  We have about 1,300 loaders (servers that collect and load logs - they may 
>>> do other things too).
>>>
>>> As I look at Elasticsearch / Logstash / Kibana does anyone know of a 
>>> performance comparison guide?  Should I expect to run on very similar 
>>> hardware?  More? or Less?
>>>
>>> Sure it depends on exactly what we're doing, the exact queries and the 
>>> frequency we'd run them but I'm trying to get any kind of idea before we 
>>> start.
>>>
>>> Are there any white papers or other documents about switching?  It seems 
>>> an obvious choice but I can only find very little performance comparisons 
>>> (I did see that Elasticsearch just hired "the former VP of Products at 
>>> Splunk, Gaurav Gupta" - but there were few numbers in that article either).
>>>
>>> Thanks,
>>> Frank
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to