Hi all,

Apologies in advance for what will be a long post. This will be of interest to you if you care about the details of Heka's design with regard to serialization and deserialization. In particular it deals with the interactions and divisions of responsibility between encoder and output plugins. It introduces some small further changes that are landing very soon, and describes some bigger changes that we are considering going forward, on which we'd appreciate feedback from anyone who might have thoughts.

Over the last couple of days trink and I have been trying to deal with the issue of stream framing in Heka's encoding layer, and digging in it's led to some changes. First, some background:

Heka uses protocol buffers as its primary serialization format. Our message objects are defined by a protobuf schema (see https://github.com/mozilla-services/heka/blob/dev/message/message.proto). Protocol buffers does not have any built in support for streaming, however; it's up to the user to implement framing to delimit the messages. Heka does this with a simple header format, also specified in the linked protobuf schema and documented here: http://hekad.readthedocs.org/en/latest/message/index.html#protobuf-stream-framing

Heka depends on this framing in a number of cases, such as when sending messages from one Heka server to another over TCP or AMQP, or when queuing messages to disk. Before the introduction of encoder plugins, certain outputs would use framing in certain cases. We had a loose (but ultimately false) assumption that whenever protobuf serialization was used the framing would be desired.

When encoders were introduced, it seemed reasonable to have the encoder handle the framing. The ProtobufEncoder would always include it, and the SandboxEncoder would include it whenever it was emitting protobuf encoded data. This quickly proved ineffectively, however. There were cases where people wanted to use protobuf encoding but didn't need the framing, such as Ian Neubert's plugins for using Amazon's SQS as a transport (https://github.com/ianneub/heka-sqs). We started by adding knobs to turn off the framing to the encoders, but digging in we realized that a) the options and code was getting more complicated than we wanted and b) there was an inherent asymmetry in the fact that by default a ProtobufEncoder generated binary data that a ProtobufDecoder could not parse (since the decoder assumed that the framing had already been removed).

This finally brings me to describing the current small change. When my latest pull request (https://github.com/mozilla-services/heka/pull/931) is merged, message framing will no longer be handled by encoder plugins at all. Instead, every output will support a 'use_framing' config option that, if set to true, will mean that Heka's stream framing should be used by that output.

It is not necessary for each individual output to specify, check for, or react to this config option. Heka itself will check if the option is there. The catch is that instead of output code calling 'OutputRunner.Encoder()' to get the encoder and then 'Encoder.Encode(pack)' to do the encoding, you will just call the newly added 'OutputRunner.Encode(pack)' method. The OutputRunner will use the encoder to perform the initial serialization, and then will add the framing header if 'use_framing' was set to true.

That is how things will stand for the 0.6 release, but for 0.7 and beyond we're thinking of making an even bigger change. Since we're now at the point where the OutputRunner is handling most of the encoding details, it seems like it might make sense to go ahead and finish the job so that the encoding (and any desired framing) happens before the output gets involved at all. This means that an output plugin would no longer be pulling `*PipelinePack` objects off of the input channel, but would instead receive already serialized `[]byte` blobs. Then output code would really focus entirely on i/o, with no need (in most cases) to think about or interact with the encoding process.

This seems to make sense to us, and we've opened up an issue on it in our tracker (https://github.com/mozilla-services/heka/issues/930). We're interested in feedback, though, especially from anyone who has written (or plans to write) Heka output plugins. If you have any thoughts or opinions, please let us know. :)

And if you made it this far and are still reading I'm not sure whether to congratulate or apologize to you.

Cheers,

-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to