Re: [heka] S3 output plugin

Rob Miller Mon, 17 Mar 2014 14:33:30 -0700

On Mon 17 Mar 2014 01:30:10 PM PDT, Dan wrote:

Thanks for the reply, we're starting with just focusing on output to
Carbon but will look at something like this when we get there now we
know what's possible.


Another related question we've found is the json format in the
FileOutput plugin. I think it exports json that resembles the internal
data structures but I think it would be possible to output cleaner json.

Have you thought about having output encoders at all? I've seen code
in the Elasticsearch output that exports simple json I think might
work so it could be good to share a json decoder between that, the
FileOutput and a possible S3 output plugin.

Yes, absolutely. This has been on our radar for quite some time, asevidenced by this ticket I opened 6 months ago:


https://github.com/mozilla-services/heka/issues/417

We've been thinking about this a lot recently, and our plan is tointroduce an Encoder plugin type that is analogous to the Decoder type.Encoders will take Heka messages as input and will emit raw bytes asoutput. They'll be encapsulated within Outputs the way that Decodersare encapsulated within Inputs. There will be a Go interface definedfor the Encoder plugin type, of course, but we'll provide aSandboxEncoder so you can use Lua to do whatever you want, and we'll domost of our work there.

This will give us some additional advantages. Right now, for instance,the CarbonOutput manages its own TCP connections, and has rudimentarykeep-alive and reconnection support. But we already have a TcpOutputthat has much more robust reconnection support, including the use ofdisk queues to ensure we don't lose messages through the disconnect /reconnect cycle. Ideally the CarbonOutput would go away, and insteadyou'd use a CarbonEncoder coupled with a TcpOutput, or a UdpOutput, ora <Whatever>Output, so all of the transport layer complexity only hasto be gotten right once.

That ticket has been open for a while, but this is on our short list ofwhat's coming next. Not everything in our 0.6 milestone (seehttp://is.gd/azbUSB) will actually make it into the 0.6 release, butthat one definitely will.

Are you using Heka to archive log data at mozilla? what format are you
storing that in if so?

Yes, we're using Heka to parse nginx and rsyslog logs into JSON (weship w/ decoders for these formats: http://is.gd/B2F6qv andhttp://is.gd/sUiE8b) which we're then feeding into ElasticSearch.Unfortunately, we're finding that ES is having a hard time keeping up.A single machine running both nginx and Heka can produce and parse morelog data than a cluster of 3 ES nodes on the same hardware can keep upwith. ES is great, easy to use, and Kibana is awesome, but it may notbe up to the scale that we need. Or we may be able to find a way tohave Heka do more aggregation and pre-calc so that we don't have toslam ES so hard. Hard to say at this point.

Thanks,


You're welcome!

-r

Dan

On 4 March 2014 17:44, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:

    On Tue 04 Mar 2014 06:26:16 AM PST, Dan wrote:

        Hi,


    Hi back!


        We are just evaluating Heka for use as our log and metrics
        aggregation
        system.


    Great! Hope you like what you find.


        We would like to archive our logs in S3 so it would be good if
        Heka could also store batches directly into a bucket.

        Is anyone working on a S3 output plugin for Heka? If not we
        might look
        at starting to write one.


    I'm not aware of anyone actively working on an S3 output at the
    moment, no. We have, however, built Cloudwatch plugins, both an
    input and an output:

    
https://github.com/mozilla-__services/heka-mozsvc-plugins/__blob/dev/cloudwatch.go
    
<https://github.com/mozilla-services/heka-mozsvc-plugins/blob/dev/cloudwatch.go>

    Those use the crowdmob fork of Canonical's goamz package to handle
    the details of interfacing w/ Amazon's API authentication
    framework. You should be able to use that code as a model to get
    something bootstrapped pretty easily.

    Our Cloudwatch plugins aren't in the Heka core, they're in a
    separate repo we set up for plugins that we think would be less
    widely used. Ultimately we'll probably create a separate repo
    specifically for AWS related plugins, so the Cloudwatch, S3, and
    any other Amazon-related plugins that get developed could have a
    nice cozy home together.

    Hope this helps!

    -r

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] S3 output plugin

Reply via email to