On 11/16/2015 11:50 AM, web user wrote:
Hi Rob,

Thanks for the response. Comments are inline:

    Not sure, really. Either would work. It's not immediately clear, but
    it looks like they're both using HTTP POST requests to push their
    data, which would work w/ Heka's HttpListenInput. On Linux, *some*
    of the system data you're looking to gather can be processed using
    Heka itself, w/ the FilePollingInput reading data directly out of
    /proc. Heka ships w/ some decoders that know how to parse the
    contents of these files (http://is.gd/r2Fk8j, http://is.gd/vvbFDy,
    http://is.gd/wMR0W3, http://is.gd/XQXzxZ), and some filters that
    know how to process the output from those decoders
    (http://is.gd/P7tBqc, http://is.gd/d9zKPB, http://is.gd/DQGOYU,
    http://is.gd/A0ymX8). These probably don't cover everything you want
    to know, however, so you might need one of the external tools
    anyway, in which case you might choose to use the other tool for
    consistency.


For now, I'm using heka to capture log file updates on the client side
and push to the heka server. Since they are both heka, I'm guessing they
have more efficient ways (other than http) to communicate. Is gob
supported?

Gob is not supported. Heka's native, most efficient serialization mechanism is protocol buffers. The simplest way to achieve what you want is to use TcpOutputs (with `use_framing` set to true and a ProtobufEncoder, which are the TcpOutput default settings) on the edge nodes, and a TcpInput (with a HekaFramingSplitter and a ProtobufDecoder, which again are the defaults) on the aggregator. If you'd like a more robust transport, you could consider switching to AMQP, Kafka, or NSQ, but those of course require running an additional service.

What is the fastest way to connect if I expect to have a
large number of heka agents connecting to the heka aggregation server.

The TcpOutput/TcpInput strategy will get you pretty far.

Is it possible to push the work on parsing the log files to the heka
agents so that there is less load on the heka server?

Absolutely. If you set up the appropriate decoders on the edge nodes, as a part of the LogstreamerInput config, then the Heka messages passed from the edge nodes to the aggregator will contain the parsed data encoded in the message fields. If you don't do the decoding on the edge, then the messages will contain the unparsed data in the message payload, and you'll need to parse them on the aggregator. Note that this will require a MultiDecoder, because you'll first need to decode from protobuf, and *then* you'll need to parse the payload of the decoded message.

(Advanced users will note that my last statement is not actually true, because it is possible to decode the protobuf data directly within a sandbox, so you could do the whole job in a single SandboxDecoder, but that's usually slightly more work, and requires more a bit more knowledge of the details of how Heka works to get dialed in correctly. It would likely give you slightly better throughput, however, although I haven't run any benchmarks to verify that statement.)

In that sense, I
really like the use of LUA scripts, since you could push out updated
parsers and add support for new log file types wihout have to ship out a
new binary for the agent.


    - https://github.com/davidbirdsong/heka-prometheus
    - https://github.com/docker-infra/heka-prometheus


For now, I'm using scollector to push directly to Prometheus. What would
be the advantage of pushing from scollector to  heka and then to proetheus?

    As a final note, it should be mentioned that Heka is not a pure Go
    project. While most of it is in Go, a lot of what makes Heka
    powerful is the way it makes use of the Lua sandbox. The Lua sandbox
    itself is written in C
    (https://github.com/mozilla-services/lua_sandbox), and the use of
    said sandbox, which is the recommended strategy for tackling many of
    the problems for which Heka is intended, of course involves using
    Lua. The sandbox is the core of the greater Heka ecosystem, and
    there are other wrappers around the sandbox, such as Hindsight
    (https://github.com/trink/hindsight), which is written in C.


This is actually what got me attracted to Heka in the first place. I
like the sand box capability and the ability to dynamically update and
push new scripts without having to restart the agent or reinstall new
binaries.

Great, just making sure you know the overall sitch. Although I should clarify that, while it's possible to push new SandboxFilters to a correctly configured Heka instance without needing a restart, deploying any other sandboxed plugin type, or changing the code underneath a filter that came from the config (rather than being dynamically injected) *will* require a restart. You're correct that you won't need to redeploy Heka itself, however.

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to