Hi Everyone,

I’m using Heka to ingest logs from a large cluster of boxes and then move
them off to an API. I have a working prototype, but right now I’m using the
LogStreamer and default TokenSplitter to create a message per log line.

This is not optimal, given that it creates a large number of messages and
thus API calls on the support system.

I’d like to buffer up to 1MB of log data and create messages in 1MB chunks.
Reading over the Extending heka documentation, it seems that this could be
done with either a Splitter plugin or a Filter plugin.

The ides of the splitter would be similar to the TokenSplitter. It would
check to see if the byte slice of data passed in is over the buffer size
and select a buffer’s worth for a message from the slice and ‘read’ that
much. It would otherwise indicated 0 bytes read as the TokenSplitter does.
I’m wondering if I would need to tweak any of the global configuration
options to make this work (e.g. max_message_loop, plugin_chansize or
max_message_size).

Alternatively, I’m thinking of implementing a filter that collects these
messages in a buffer and flushes the buffer when the desired size is
reached. The problem with this is I’ll have multiple log streams and I
wouldn’t want to cross the streams. Also, as much as possible, I’d like to
preserve the order of lines within a single chunk (messages themselves are
encoded with a timestamp for later reassembly).

Thoughts on either approach?


Thanks,


Eli
-- 
—
*Elijah Flesher*  |  *Lyft* <http://lyft.me/>  |  *Software Engineer*
206.661.4697  |  @eliflesher
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to