Re: [heka] State and future of Heka

Rob Miller Tue, 10 May 2016 09:34:05 -0700

Right, Heka currently has many more data ingestion plugins, so it's very useful 
running on the endpoints snarfing up logs and system data and etc. One 
possibility that interests me is extracting the Logstreamer code out of Heka 
into a standalone utility, which could be used to feed data into Hindsight for 
parsing / processing.


-r


On 05/10/2016 06:50 AM, Simon Pasquier wrote:

First of all, big big thanks to Rob, Trink and Mozilla. We've been
working with Heka for about a year and it's always been a pleasure to
work with.
Hopefully we'll be giving a try at Hindsight in the next weeks and I'm
not too worried that we'll have a smooth migration path from Heka to
Hindsight.
I've got one small question. You wrote that "Heka is still in use here,
though, especially on our edge nodes". IIUC Hindsight doesn't have an
input plugin for logs yet and since the Heka logstreamer plugin is a
quite complex (and neat!) piece of code, my guess is that you will
continue to use Heka for log streaming. Is this correct?
Thanks!
Simon

On Fri, May 6, 2016 at 7:51 PM, Rob Miller <rmil...@mozilla.com
<mailto:rmil...@mozilla.com>> wrote:

    Hi everyone,

    I'm loooong overdue in sending out an update about the current state
    of and plans for Heka. Unfortunately, what I have to share here will
    probably be disappointing for many of you, and it might impact
    whether or not you want to continue using it, as all signs point to
    Heka getting less support and fewer updates moving forward.

    The short version is that Heka has some design flaws that make it
    hard to incrementally improve it enough to meet the high throughput
    and reliability goals that we were hoping to achieve. While it would
    be possible to do a major overhaul of the code to resolve most of
    these issues, I don't have the personal bandwidth to do that work,
    since most of my time is consumed working on Mozilla's immediate
    data processing needs rather than general purpose tools these days.
    Hindsight (https://github.com/trink/hindsight), built around the
    same Lua sandbox technology as Heka, doesn't have these issues, and
    internally we're using it more and more instead of Heka, so there's
    no organizational imperative for me (or anyone else) to spend the
    time required to overhaul the Go code base.

    Heka is still in use here, though, especially on our edge nodes, so
    it will see a bit more improvement and at least a couple more
    releases. Most notably, it's on my list to switch to using the most
    recent Lua sandbox code, which will move most of the protobuf
    processing to custom C code, and will likely improve performance as
    well as remove a lot of the problematic cgo code, which is what's
    currently keeping us from being able to upgrade to a recent Go version.

    Beyond that, however, Heka's future is uncertain. The code that's
    there will still work, of course, but I may not be doing any further
    improvements, and my ability to keep up with support requests and
    PRs, already on the decline, will likely continue to wane.

    So what are the options? If you're using a significant amount of Lua
    based functionality, you might consider transitioning to Hindsight.
    Any Lua code that works in Heka will work in Hindsight. Hindsight is
    a much leaner and more solid foundation. Hindsight has far fewer i/o
    plugins than Heka, though, so for many it won't be a simple transition.

    Also, if there's someone out there (an organization, most likely)
    that has a strong interest in keeping Heka's codebase alive, through
    funding or coding contributions, I'd be happy to support that
    endeavor. Some restrictions apply, however; the work that needs to
    be done to improve Heka's foundation is not beginner level work, and
    my time to help is very limited, so I'm only willing to support
    folks who demonstrate that they are up to the task. Please contact
    me off-list if you or your organization is interested.

    Anyone casually following along can probably stop reading here.
    Those of you interested in the gory details can read on to hear more
    about what the issues are and how they might be resolved.

    First, I'll say that I think there's a lot that Heka got right. The
    basic composition of the pipeline (input -> split -> decode -> route
    -> process -> encode -> output) seems to hit a sweet spot for
    composability and reuse. The Lua sandbox, and especially the use of
    LPEG for text parsing and transformation, has proven to be extremely
    efficient and powerful; it's the most important and valuable part of
    the Heka stack. The routing infrastructure is efficient and solid.
    And, perhaps most importantly, Heka is useful; there are a lot of
    you out there using it to get work done.

    There was one fundamental mistake made, however, which is that we
    shouldn't have used channels. There are many competing opinions
    about Go channels. I'm not going to get in to whether or not they're
    *ever* a good idea, but I will say unequivocally that their use as
    the means of pushing messages through the Heka pipeline was a
    mistake, for a number of reasons.

    First, they don't perform well enough. While Heka performs many
    tasks faster than some other popular tools, we've consistently hit a
    throughput ceiling thanks to all of the synchronization that
    channels require. And this ceiling, sadly, is generally lower than
    is acceptable for the amount of data that we at Mozilla want to push
    through our aggregators single system.

    Second, they make it very hard to prevent message loss. If
    unbuffered channels are used everywhere, performance plummets
    unacceptably due to context-switching costs. But using buffered
    channels means that many messages are in flight at a time, most of
    which are sitting in channels waiting to be processed. Keeping track
    of which messages have made it all the way through the pipeline
    requires complicated coordination between chunks of code that are
    conceptually quite far away from each other.

    Third, the buffered channels mean that Heka consumes much more RAM
    than would be otherwise needed, since we have to pre-allocate a pool
    of messages. If the pool size is too small, then Heka becomes
    susceptible to deadlocks, with all of the available packs sitting in
    channel queues, unable to be processed because some plugin is
    blocked on waiting for an available pack. But cranking up the pool
    size causes Heka to use more memory, even when it's idle.

    Hindsight avoids all of these problems by using disk queues instead
    of RAM buffers between all of the processing stages. It's a bit
    counterintuitive, but at high throughput performance is actually
    better than with RAM buffers, because a) there's no need for
    synchronization locks and b) the data is typically read quickly
    enough after it's written that it stays in the disk cache.

    There's much less chance of message loss, because every plugin is
    holding on to only one message in memory at a time, while using a
    written-to-disk cursor file to track the current position in the
    disk buffer. If the plug is pulled mid-process, some messages that
    were already processed might be processed again, but nothing will be
    lost, and there's no need for complex coordination between different
    stages of the pipeline.

    Finally, there's no need for a pool of messages. Each plugin is
    holding some small number of packs (possibly as few as one) in its
    own memory space, and those packs never escape that plugin's
    ownership. RAM usage doesn't grow, and pool exhaustion related
    deadlocks are a thing of the past.

    For Heka to have a viable future, it would basically need to be
    updated to work almost exactly like Hindsight. First, all of the
    APIs would need to be changed to no longer refer to channels. (The
    fact that we exposed channels to the APIs is another mistake we
    made... it's now generally frowned upon in Go land to expose
    channels as part of your public APIs.) There's already a non-channel
    based API for filters and outputs, but most of the plugins haven't
    yet been updated to use the new API, which would need to happen.

    Then the hard work would start; a major overhaul of Heka's
    internals, to switch from channel based message passing to disk
    queue based message passing. The work that's been done to support
    disk buffering for filters and outputs is useful, but not quite
    enough, because it's not scalable for each plugin to have its own
    queue; the number of open file descriptors would grow very quickly.
    Instead it would need to work like Hindsight, where there's one
    queue that all of the inputs write to, and another that filters
    write to. Each plugin reads through its specified input queue,
    looking for messages that match its message matcher, writing its
    location in the queue back to the shared cursors file.

    There would also be some complexity in reconciling Heka's breakdown
    of the input stage into input/splitter/decoder with Hindsight's
    encapsulation of all of these stages into a single sandbox.

    Ultimately I think this would be at least 2-3 months full time work
    for me. I'm not the fastest coder around, but I know where the
    bodies are buried, so I'd guess it would take anyone else at least
    as long, possibly longer if they're not already familiar with how
    everything is put together.

    And that's about it. If you've gotten this far, thanks for reading.
    Also, thanks to everyone who's contributed to Heka in any way, be it
    by code, doc fixes, bug reports, or even just appreciation. I'm
    sorry for those of you using it regularly that there's not a more
    stable future.

    Regards,

    -r
    _______________________________________________
    Heka mailing list
    Heka@mozilla.org <mailto:Heka@mozilla.org>
    https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

Re: [heka] State and future of Heka

Reply via email to