Rob Miller writes: > This would work, and may be a way to get started, but it is suboptimal for a > few reasons: > > * PayloadRegexDecoder is a convenient way to get started for folks who are > unfamiliar with using grammars, but it is generally slower, less flexible, > and less composable / reusable than LPEG. I think the time spent writing > regular expressions to parse your logs would be better spent learning to use > grammars. > > * MultiDecoder only supports running all of the registered decoders in > sequence (not at all suitable for this use case) or cascading through them > all such that the first successful decoder wins. The latter choice can be > made to work, but clearly it's pretty inefficient when there are more than 2 > or 3 decoders to choose from. > > * Due to the mem copies required when transferring data between Go and C, > there is a small performance cost whenever you cross a sandbox boundary. This > is small enough to still allow for reasonably good throughput in most cases, > but if you have a MultiDecoder chaining multiple SandboxDecoders together > you'll end up crossing that boundary many times in rapid succession, which > certainly will burn cycles unnecessarily, and might slow things down more > than is acceptable. > > We've considered adding some sort of routing to the MultiDecoder, which would > allow you to look at input data and decide which decoder should receive it > based on arbitrary conditions, but that's not yet in place. >
This would be really cool for my use case, too. Currently all of my decoders in the MultiDecoder chain are implemented in Go so I'm not worried as much about the performance implications (and load is quite low), but more advanced routing would help as there are dependencies between the decoders. If the first fails, trying any others is pointless because the message is already missing vital data; it is up to the subsequent decoders to identify the failure, however, since cascade_strategy=all has no short-circuit method. Then again, I'm using Heka pretty outside its core use case of processing log and analytics data. > > The best solution for this right now would be to do all of the work in a > single SandboxDecoder. If you look at the various sandbox-based decoders that > Heka provides, you'll see that most of the heavy lifting isn't done in the > decoder code itself, but is delegated to Lua modules that we provide. > Similarly, custom grammars can be added to an existing Heka installation as > Lua modules. That way the main decoder code could use `read_message` calls to > examine the input data, decide what type of message has been received, and > invoke the appropriate parsing grammar for each one. > > Whether it's worth it to you to set this up probably depends on the amount of > data you need to process. If the MultiDecoder solution works, great, but keep > in mind that if you start to need more throughput that you can evolve your > system to meet the need. > > Hope this helps! > > -r > > > On 04/06/2015 09:56 AM, Ali wrote: >> Ah-ha! Should I use a combination of MultiDecoder and >> PayloadRegexDecoder (for custom formats)? And just assign the >> MultiDecoder to the TcpInput? >> >> -Ali >> >> On Mon, Apr 6, 2015 at 11:49 AM Ali <[email protected] >> <mailto:[email protected]>> wrote: >> >> Morning, all! >> >> I'm trying out nxlog on remote hosts and having nxlog send logs to >> my Heka host's TcpInput. However, I'm starting to add multiple >> types of log data (syslog files, Apache logs, Tomcat logs) to the >> nxlog forwarder and I'm wondering how best to handle this. Should I >> configure Heka to use a single TcpInput for all of these different >> message types? Should I configure a separate TcpInput for each >> distinct message type? Something else? >> >> TIA, >> Ali >> >> >> >> _______________________________________________ >> Heka mailing list >> [email protected] >> https://mail.mozilla.org/listinfo/heka >> > > _______________________________________________ > Heka mailing list > [email protected] > https://mail.mozilla.org/listinfo/heka -- _______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

