On 11/16/2015 11:50 AM, web user wrote:
Hi Rob,
Thanks for the response. Comments are inline:
Not sure, really. Either would work. It's not immediately clear, but
it looks like they're both using HTTP POST requests to push their
data, which would work w/ Heka's HttpListenInput. On Linux, *some*
of the system data you're looking to gather can be processed using
Heka itself, w/ the FilePollingInput reading data directly out of
/proc. Heka ships w/ some decoders that know how to parse the
contents of these files (http://is.gd/r2Fk8j, http://is.gd/vvbFDy,
http://is.gd/wMR0W3, http://is.gd/XQXzxZ), and some filters that
know how to process the output from those decoders
(http://is.gd/P7tBqc, http://is.gd/d9zKPB, http://is.gd/DQGOYU,
http://is.gd/A0ymX8). These probably don't cover everything you want
to know, however, so you might need one of the external tools
anyway, in which case you might choose to use the other tool for
consistency.
For now, I'm using heka to capture log file updates on the client side
and push to the heka server. Since they are both heka, I'm guessing they
have more efficient ways (other than http) to communicate. Is gob
supported?
Gob is not supported. Heka's native, most efficient serialization
mechanism is protocol buffers. The simplest way to achieve what you want
is to use TcpOutputs (with `use_framing` set to true and a
ProtobufEncoder, which are the TcpOutput default settings) on the edge
nodes, and a TcpInput (with a HekaFramingSplitter and a ProtobufDecoder,
which again are the defaults) on the aggregator. If you'd like a more
robust transport, you could consider switching to AMQP, Kafka, or NSQ,
but those of course require running an additional service.
What is the fastest way to connect if I expect to have a
large number of heka agents connecting to the heka aggregation server.
The TcpOutput/TcpInput strategy will get you pretty far.
Is it possible to push the work on parsing the log files to the heka
agents so that there is less load on the heka server?
Absolutely. If you set up the appropriate decoders on the edge nodes, as
a part of the LogstreamerInput config, then the Heka messages passed
from the edge nodes to the aggregator will contain the parsed data
encoded in the message fields. If you don't do the decoding on the edge,
then the messages will contain the unparsed data in the message payload,
and you'll need to parse them on the aggregator. Note that this will
require a MultiDecoder, because you'll first need to decode from
protobuf, and *then* you'll need to parse the payload of the decoded
message.
(Advanced users will note that my last statement is not actually true,
because it is possible to decode the protobuf data directly within a
sandbox, so you could do the whole job in a single SandboxDecoder, but
that's usually slightly more work, and requires more a bit more
knowledge of the details of how Heka works to get dialed in correctly.
It would likely give you slightly better throughput, however, although I
haven't run any benchmarks to verify that statement.)
In that sense, I
really like the use of LUA scripts, since you could push out updated
parsers and add support for new log file types wihout have to ship out a
new binary for the agent.
- https://github.com/davidbirdsong/heka-prometheus
- https://github.com/docker-infra/heka-prometheus
For now, I'm using scollector to push directly to Prometheus. What would
be the advantage of pushing from scollector to heka and then to proetheus?
As a final note, it should be mentioned that Heka is not a pure Go
project. While most of it is in Go, a lot of what makes Heka
powerful is the way it makes use of the Lua sandbox. The Lua sandbox
itself is written in C
(https://github.com/mozilla-services/lua_sandbox), and the use of
said sandbox, which is the recommended strategy for tackling many of
the problems for which Heka is intended, of course involves using
Lua. The sandbox is the core of the greater Heka ecosystem, and
there are other wrappers around the sandbox, such as Hindsight
(https://github.com/trink/hindsight), which is written in C.
This is actually what got me attracted to Heka in the first place. I
like the sand box capability and the ability to dynamically update and
push new scripts without having to restart the agent or reinstall new
binaries.
Great, just making sure you know the overall sitch. Although I should
clarify that, while it's possible to push new SandboxFilters to a
correctly configured Heka instance without needing a restart, deploying
any other sandboxed plugin type, or changing the code underneath a
filter that came from the config (rather than being dynamically
injected) *will* require a restart. You're correct that you won't need
to redeploy Heka itself, however.
-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka