On Mon 17 Mar 2014 01:30:10 PM PDT, Dan wrote:
Thanks for the reply, we're starting with just focusing on output to
Carbon but will look at something like this when we get there now we
know what's possible.

Another related question we've found is the json format in the
FileOutput plugin. I think it exports json that resembles the internal
data structures but I think it would be possible to output cleaner json.

Have you thought about having output encoders at all? I've seen code
in the Elasticsearch output that exports simple json I think might
work so it could be good to share a json decoder between that, the
FileOutput and a possible S3 output plugin.

Yes, absolutely. This has been on our radar for quite some time, as evidenced by this ticket I opened 6 months ago:

https://github.com/mozilla-services/heka/issues/417

We've been thinking about this a lot recently, and our plan is to introduce an Encoder plugin type that is analogous to the Decoder type. Encoders will take Heka messages as input and will emit raw bytes as output. They'll be encapsulated within Outputs the way that Decoders are encapsulated within Inputs. There will be a Go interface defined for the Encoder plugin type, of course, but we'll provide a SandboxEncoder so you can use Lua to do whatever you want, and we'll do most of our work there.

This will give us some additional advantages. Right now, for instance, the CarbonOutput manages its own TCP connections, and has rudimentary keep-alive and reconnection support. But we already have a TcpOutput that has much more robust reconnection support, including the use of disk queues to ensure we don't lose messages through the disconnect / reconnect cycle. Ideally the CarbonOutput would go away, and instead you'd use a CarbonEncoder coupled with a TcpOutput, or a UdpOutput, or a <Whatever>Output, so all of the transport layer complexity only has to be gotten right once.

That ticket has been open for a while, but this is on our short list of what's coming next. Not everything in our 0.6 milestone (see http://is.gd/azbUSB) will actually make it into the 0.6 release, but that one definitely will.

Are you using Heka to archive log data at mozilla? what format are you
storing that in if so?

Yes, we're using Heka to parse nginx and rsyslog logs into JSON (we ship w/ decoders for these formats: http://is.gd/B2F6qv and http://is.gd/sUiE8b) which we're then feeding into ElasticSearch. Unfortunately, we're finding that ES is having a hard time keeping up. A single machine running both nginx and Heka can produce and parse more log data than a cluster of 3 ES nodes on the same hardware can keep up with. ES is great, easy to use, and Kibana is awesome, but it may not be up to the scale that we need. Or we may be able to find a way to have Heka do more aggregation and pre-calc so that we don't have to slam ES so hard. Hard to say at this point.

Thanks,

You're welcome!

-r



Dan

On 4 March 2014 17:44, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:

    On Tue 04 Mar 2014 06:26:16 AM PST, Dan wrote:

        Hi,


    Hi back!


        We are just evaluating Heka for use as our log and metrics
        aggregation
        system.


    Great! Hope you like what you find.


        We would like to archive our logs in S3 so it would be good if
        Heka could also store batches directly into a bucket.

        Is anyone working on a S3 output plugin for Heka? If not we
        might look
        at starting to write one.


    I'm not aware of anyone actively working on an S3 output at the
    moment, no. We have, however, built Cloudwatch plugins, both an
    input and an output:

    
https://github.com/mozilla-__services/heka-mozsvc-plugins/__blob/dev/cloudwatch.go
    
<https://github.com/mozilla-services/heka-mozsvc-plugins/blob/dev/cloudwatch.go>

    Those use the crowdmob fork of Canonical's goamz package to handle
    the details of interfacing w/ Amazon's API authentication
    framework. You should be able to use that code as a model to get
    something bootstrapped pretty easily.

    Our Cloudwatch plugins aren't in the Heka core, they're in a
    separate repo we set up for plugins that we think would be less
    widely used. Ultimately we'll probably create a separate repo
    specifically for AWS related plugins, so the Cloudwatch, S3, and
    any other Amazon-related plugins that get developed could have a
    nice cozy home together.

    Hope this helps!

    -r


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to