Hi Matt,

We happen to have recently (a couple of months back) made the transition
from a typical ELK stack, to what we now call our HEK stack (for reasons
both obvious and enduringly hilarious), largely due to the exact issues
you're concerned about. This is strictly anecdotal, as there seem to be
lots of other folks out there using logstash successfully; I've got nothing
against it, it just didn't end up working for us in the way we'd hoped. It
should also be said that Heka has a broader scope than Logstash, in that
it's made to address general stream processing needs, whereas logstash is
more focused on log processing specifically, so any effort to do a direct
comparison should bear that in mind.

Our logstash pipeline was a pretty common one, we used beaver as an agent,
which slurped logs, wrapped them up as a logstash json_event, and pushed
them onto a redis queue as a buffer. Logstash was configured with this
redis queue as an input (along with a syslog input for dumber devices).
Logstash would then pass the event through some "filters" before dumping to
the Elasticsearch output.

We ran into various problems with different pieces of this pipeline at
different times.

We had minor problems getting beaver installed at one point due to
dependency conflicts (we used pip to install), but were able to work around
them by reverting to an older version of beaver.

Whenever logstash was down for maintenance (and sometimes when it wasn't),
the redis queue would back up pretty quickly, which led to some pretty
awesome stuff like our production redis cluster going read-only when the
memory got big enough to no longer successfully bgsave (that was a fun
post-mortem). We mitigated by setting up a separate, logstash-specific
redis instance that we tuned to not care about being able to bgsave.

Whenever elasticsearch was down, logstash would happily drop messages on
encountering an error on the output; easy enough to deal with by stopping
logstash before ES maintenance (just let the redis queue pile up), but not
so great for unplanned outages (it looks like 1.5.0 just added retries, so
this may not be an issue any more).

Most frustratingly for us, logstash would occasionally, and without any
error logs, or reason we could ascertain, drastically slow down or (more
often) completely hang. Most of the time, at least at first, a simple
restart would get things going again. I ended up bandaiding the problem by
putting together a monit program check that would exit 1 if the logstash
redis queue was too high for too long (a couple hundred thousand messages
was a sure sign we had a problem) and exit 0 otherwise. It was around this
time that I started researching alternative solutions, and started reading
about heka. Unfortunately, even with the monit check, sometimes logstash
would stay hung after the restart. We tried various things to troubleshoot,
from clearing the logstash on-disk buffer while it was stopped to dumping
and flushing the logstash queue to give it a headstart and trash any
potentially poison messages. I had hopes that logstash 1.5 would address
these problems, but never got a chance to try it out, as it didn't get
released before we moved on. At one point we straced the process and found
it idle, polling the network socket (though any other process could reach
redis without issue), but didn't pursue the issue any further, as by this
point we'd already decided that heka was a better fit with our preferences
(distributed) and goals (general purpose, able to support more than just
log processing), and was something we wanted to adopt.

One of the things that initially attracted us to heka was the ability to
run it as a distributed agent, with each instance able to handle the full
set of processing steps, or ship to a central system for further
aggregation/processing or storage. It's done very well in this role since
we switched, and we've yet to run into any resource utilization problems
doing so, which was our one concern; heka uses 50-60M of RAM on our
load-balancers and our web front-ends, which are the most active, and ~25M
elsewhere. This is actually comparable to what we saw with beaver, which
did a lot less (guessing this is down to Python vs Go). Heka's splitters
are also much nicer to work with than the less-flexible regex-based
multiline support in beaver.

TOML is also a really nice way to manage configuration; there's some nice
libraries for ruby (we use Chef), and treating /etc/heka as an include.d
directory works nicely for keeping the relevant heka configuration close to
the service that generates the logs.

Speaking of which, my *favorite* thing so far with heka has been being able
to run both a system-level heka agent for system logs, nginx logs, etc, and
also shipping a heka config directly in the repo with our various apps, so
our dev teams have better visibility into the log collection and can change
the log format and the log decoding in a single PR. We then just run a
user-level heka instance under our app process supervisor that points to
this config. It's been working nicely so far.

As to downsides:

- the learning curve is a bit steeper with heka, but not insurmountable,
and is largely owed to it's greater flexibility. Happily both tools have
amazing documentation.
- the logstash ecosystem is bigger, so there's more pre-built plugins
available that you can just drop in.
- go and lua are great languages, well suited to the domain, but not as
popular as ruby, so there may be a bit of a learning curve there depending
on your background. unless you need to write custom plugins, this may not
matter.
- having to recompile the binary to integrate the go plugins available for
heka is a bit of a bummer.

Related to the last point, sandbox plugins (lua) don't require a recompile
and are actually pretty easy to throw together as needed (e.g.
https://gist.github.com/nathwill/d3f62d46d173b2456531). I suspect that'll
be the most popular method of adding plugins, with the Go stuff reserved
for core plugins and cases where the best possible performance is needed.
We've avoided the need to build a custom binary so far, and at this point I
don't anticipate that we'll need to.

We've also found the heka community to be incredibly accessible and helpful
(thanks, y'all!) when we got stuck on something, so if you do decide to try
it out, chances are good that support will be easy to come by.

Anyways, this turned into more words than I'd intended, but that's the
rough outline of our experience with both. Hope it helps!

Cheers,

Nathan W

On Fri, May 29, 2015 at 6:32 PM, Matthew Singletary <
[email protected]> wrote:

> At work we currently have an intern setting up a prototype ELK stack
> (Elastic Search, Logstash, Kibana). While this seems to be fairly easy to
> set up and will have lots of nifty looking graphs, I worry about how
> logstash will be aggregating the inputs from potentially many machines.
>
> I figure that elasticsearch and kibana would still be usable with possibly
> logstash being replaced by heka, does this sound right?
>
> Any thoughts, comparisons or war stories?
>
> Thanks,
> Matt
>
> _______________________________________________
> Heka mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/heka
>
>
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to