Hi Matt, We happen to have recently (a couple of months back) made the transition from a typical ELK stack, to what we now call our HEK stack (for reasons both obvious and enduringly hilarious), largely due to the exact issues you're concerned about. This is strictly anecdotal, as there seem to be lots of other folks out there using logstash successfully; I've got nothing against it, it just didn't end up working for us in the way we'd hoped. It should also be said that Heka has a broader scope than Logstash, in that it's made to address general stream processing needs, whereas logstash is more focused on log processing specifically, so any effort to do a direct comparison should bear that in mind.
Our logstash pipeline was a pretty common one, we used beaver as an agent, which slurped logs, wrapped them up as a logstash json_event, and pushed them onto a redis queue as a buffer. Logstash was configured with this redis queue as an input (along with a syslog input for dumber devices). Logstash would then pass the event through some "filters" before dumping to the Elasticsearch output. We ran into various problems with different pieces of this pipeline at different times. We had minor problems getting beaver installed at one point due to dependency conflicts (we used pip to install), but were able to work around them by reverting to an older version of beaver. Whenever logstash was down for maintenance (and sometimes when it wasn't), the redis queue would back up pretty quickly, which led to some pretty awesome stuff like our production redis cluster going read-only when the memory got big enough to no longer successfully bgsave (that was a fun post-mortem). We mitigated by setting up a separate, logstash-specific redis instance that we tuned to not care about being able to bgsave. Whenever elasticsearch was down, logstash would happily drop messages on encountering an error on the output; easy enough to deal with by stopping logstash before ES maintenance (just let the redis queue pile up), but not so great for unplanned outages (it looks like 1.5.0 just added retries, so this may not be an issue any more). Most frustratingly for us, logstash would occasionally, and without any error logs, or reason we could ascertain, drastically slow down or (more often) completely hang. Most of the time, at least at first, a simple restart would get things going again. I ended up bandaiding the problem by putting together a monit program check that would exit 1 if the logstash redis queue was too high for too long (a couple hundred thousand messages was a sure sign we had a problem) and exit 0 otherwise. It was around this time that I started researching alternative solutions, and started reading about heka. Unfortunately, even with the monit check, sometimes logstash would stay hung after the restart. We tried various things to troubleshoot, from clearing the logstash on-disk buffer while it was stopped to dumping and flushing the logstash queue to give it a headstart and trash any potentially poison messages. I had hopes that logstash 1.5 would address these problems, but never got a chance to try it out, as it didn't get released before we moved on. At one point we straced the process and found it idle, polling the network socket (though any other process could reach redis without issue), but didn't pursue the issue any further, as by this point we'd already decided that heka was a better fit with our preferences (distributed) and goals (general purpose, able to support more than just log processing), and was something we wanted to adopt. One of the things that initially attracted us to heka was the ability to run it as a distributed agent, with each instance able to handle the full set of processing steps, or ship to a central system for further aggregation/processing or storage. It's done very well in this role since we switched, and we've yet to run into any resource utilization problems doing so, which was our one concern; heka uses 50-60M of RAM on our load-balancers and our web front-ends, which are the most active, and ~25M elsewhere. This is actually comparable to what we saw with beaver, which did a lot less (guessing this is down to Python vs Go). Heka's splitters are also much nicer to work with than the less-flexible regex-based multiline support in beaver. TOML is also a really nice way to manage configuration; there's some nice libraries for ruby (we use Chef), and treating /etc/heka as an include.d directory works nicely for keeping the relevant heka configuration close to the service that generates the logs. Speaking of which, my *favorite* thing so far with heka has been being able to run both a system-level heka agent for system logs, nginx logs, etc, and also shipping a heka config directly in the repo with our various apps, so our dev teams have better visibility into the log collection and can change the log format and the log decoding in a single PR. We then just run a user-level heka instance under our app process supervisor that points to this config. It's been working nicely so far. As to downsides: - the learning curve is a bit steeper with heka, but not insurmountable, and is largely owed to it's greater flexibility. Happily both tools have amazing documentation. - the logstash ecosystem is bigger, so there's more pre-built plugins available that you can just drop in. - go and lua are great languages, well suited to the domain, but not as popular as ruby, so there may be a bit of a learning curve there depending on your background. unless you need to write custom plugins, this may not matter. - having to recompile the binary to integrate the go plugins available for heka is a bit of a bummer. Related to the last point, sandbox plugins (lua) don't require a recompile and are actually pretty easy to throw together as needed (e.g. https://gist.github.com/nathwill/d3f62d46d173b2456531). I suspect that'll be the most popular method of adding plugins, with the Go stuff reserved for core plugins and cases where the best possible performance is needed. We've avoided the need to build a custom binary so far, and at this point I don't anticipate that we'll need to. We've also found the heka community to be incredibly accessible and helpful (thanks, y'all!) when we got stuck on something, so if you do decide to try it out, chances are good that support will be easy to come by. Anyways, this turned into more words than I'd intended, but that's the rough outline of our experience with both. Hope it helps! Cheers, Nathan W On Fri, May 29, 2015 at 6:32 PM, Matthew Singletary < [email protected]> wrote: > At work we currently have an intern setting up a prototype ELK stack > (Elastic Search, Logstash, Kibana). While this seems to be fairly easy to > set up and will have lots of nifty looking graphs, I worry about how > logstash will be aggregating the inputs from potentially many machines. > > I figure that elasticsearch and kibana would still be usable with possibly > logstash being replaced by heka, does this sound right? > > Any thoughts, comparisons or war stories? > > Thanks, > Matt > > _______________________________________________ > Heka mailing list > [email protected] > https://mail.mozilla.org/listinfo/heka > >
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

