Re: [heka] TcpInput performance

Rob Miller Wed, 23 Sep 2015 10:20:24 -0700

On 09/22/2015 11:06 PM, Andre wrote:

Rob,


Thanks for the quick reply.

Just tested with PayloadEncoder but got the similar results.

I find it hard to believe that switching from ESJsonEncoder to PayloadEncoder 
didn't change your throughput. ESJsonEncoder is much slower than 
PayloadEncoder. This is usually okay, however, b/c ESJsonEncoder is usually 
used when sending data to ElasticSearch, and ElasticSearch itself is usually 
much more of a bottleneck than Heka is.

But after reading your post I realised that:

1) My correct pipeline is:

TcpInput -> TokenSplitter -> PayloadRegexDecoder -> ESJsonEncoder ->
FileOutput

I don't understand why you'd want to use ESJsonEncoder with a FileOutput. 
What's going on there?

Also, PayloadRegexDecoder is known to be slow and unwieldy. If you're doing any 
parsing at all, you can get much better performance with a SandboxDecoder that 
uses an LPEG grammar than you can with a PayloadRegexDecoder.

2) I should give it a go and remove TokenSplitter from the pipeline

Why would you remove TokenSplitter? If you want to split the records on 
newlines, then TokenSplitter is what you want.

Indeed my EPS jumped to 21.6K (however I could no longer have the
messages flowing through the pipeline as Heka no longer recognised them
after their input. :D )

I don't understand the context enough to know what you mean by this.

So  on I guess the bottleneck is either in the Splitter or the Regex
decoder (though the regex is a simple '^(?P<Payload>.*)' )

PayloadRegexDecoder is generally pretty slow. And what's the point of using a 
PayloadRegexDecoder when you're not even doing any decoding? If you want to 
pass the payload through without any parsing, then don't use a decoder at all.

So clearly I don't understand a lot of what you're describing here. If you want 
meaningful help, you'll probably want to include the actual TOML configuration 
that you're using, so we can see the full context of your setup. Also useful 
would be describing what it is you're trying to accomplish, exactly.

-r


Cheers



On Wed, Sep 23, 2015 at 2:42 PM, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:

    Almost certainly the slowest part of your pipeline there is the
    ESJsonEncoder. What does the throughput look like if you replace
    that with a PayloadEncoder?

    -r


    On 09/22/2015 09:33 PM, Andre wrote:

        Hi All,

        I was doing some performance tests around hekad with a simple test

        KVM VM running ubuntu,
        2GB
        4 core VM
        magnetic drives

        The test pipeline was the traditional vanilla pipeline similar to:

        TcpInput -> TokenSplitter -> ESJsonEncoder -> FileOutput

        The sample used was a stream of plain text TCP syslog generated by
        syslog-ng's loggen tool.

        Good news is that there's no message loss, bad news is that
        performance
        is somewhat lacklustre, while a normal rsyslogd doing similar
        work would
        do the same job at significantly higher rates, hekad was kind of
        stuck
        around 12K EPS.

        I started setting maxproc to 4, then tried playing with the poolsize
        (setting it to large value like 50000) but the only consequence
        of that
        was heka consuming more memory, EPS continue more or less the same.

        Has anyone exceeded this performance with hekad under similar
        pipelines
        (i.e. TcpInput) and HW conditions (small VMs)?

        If yes, mind sharing a bit on how the hekad was configured?

        Kind regards


        _______________________________________________
        Heka mailing list
        [email protected] <mailto:[email protected]>
        https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] TcpInput performance

Reply via email to